



















Globalize Your Business 



Melissa Data can help you globalize your applications as you 
expand operations to other countries or reach new customers in 
emerging markets. As a world leading data quality vendor, we 
offer solutions to verify, correct and standardize addresses in 
over 240 countries. Eliminate returns, cut postage expenses, 
prevent fraud and keep your customers happy by verifying their 
address before you send a package. 

•Reduce address correction fees - save up to $10 per package 
• Efficiently validate and correct addresses every time you ship 
•Maintain high customer satisfaction 

Accurate data. Delivered. 


Address Verification 
ID Verification 
^ Email Verfication 
GeoCoding 
IP Location 
Name Parsing 
Phone Verification 
Record Matching 


www.MelissaData.com/global 
or call 1-800-MELISSA (635-4772) 
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Your Partner in Data Quality 


Request a free trial of our Multiplatform APIs and Web Services for rapid application development. 







1*1 DEDICATED 

SERVERS 


1&1 is celebrating its 25 th anniversary. 

Over the past 25 years 1&1 has grown to become 
one of the world's leading web hosts. Today, with 
12 million customer contracts and 5000 employees 
1&1 provides superior web hosting and server 
solutions to support your business. In celebration 
here is a gift from us to you. 
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|| Parallels 

Plesk Panel 


$360 


Save 


Server XL 6 


AMD Hexa-Core 


Our data centers offer top security, 

Cisco firewall protection and maximum uptime 


6 Cores x 2,8 GHz 
(3.3 GHz Turbo Core) 


16 GB RAM DDR3 ECC 


✓ Unlimited traffic with no extra cost 

✓ Parallels Plesk® Panel 11 for unlimited domains 
Exclusive to 1&1: Optional SUSE Linux Enterprise Server 

>r Mobile Server Monitoring for Android / iOS devices 


,000 GB (2 x 1,000 SATA) 


Software RAID 


Free choice of CentOS, Debian 
Ubuntu, or openSUSE. 


Unlimited Traffic (100 Mbit/s) 


S 9Q 


.99 

per month 


* Offer valid for a limited time only. First year save $360 off regular price of Dedicated Server XL 6. Other terms and conditions may apply. Visit www.landl .com for full promotional 
offer details. Program and pricing specifications and availability subject to change without notice. 1&1 and the 1&1 logo are trademarks of 1&1 Internet, all other trademarks are the 
property of their respective owners. © 2013 1&1 Internet. All rights reserved. 

Intel, the Intel logo, Xeon, and Xeon Inside are trademarks or registered trademarks of Intel Corporation in the U.S. and/or other countries. 
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Microsoft 



Cook up the 
next great app. 

Opportunity doesn't just knock. In the Windows Store 
it swipes, taps and clicks, too. See how Allrecipes 
and others are building immersive apps for the new 
Windows experience and learn how you can put 
your app in the hands of new users everywhere. 


Build for the new Windows Store. 

Open for business at windowsstore.com 


mm Windows 8 



Instantly Search 
Terabytes of Text 


• 25+ fielded and full-text search types 

• dtSearch's own document filters 
support "Office," PDF, HTML, XML, 

ZIP, emails (with nested attachments), 
and many other file types 

• Supports databases as well as static 
and dynamic websites 

• Highlights hits in all of the above 

• APIs for .NET, Java, C++, SQL, etc. 



64-bit and 32-bit; Win and Linux 


"covers all data sources" 



"results in less than a second" 

InfoWorld 
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hundreds more reviews and developer 
case studies at www.dtsearch.com 


dtSearch products: 

Desktop with Spider ♦ Web with Spider 
Network with Spider ♦ Engine for Win & .NET 
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Ask about fully-functional evaluations 
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OPINION, 9 


FROM THE EDITORS 


One HTML, under W3C 


T he World Wide Web Consortium 
has advanced HTML5 to a candi¬ 
date recommendation, essentially say¬ 
ing that the 5.0 version of the specifica¬ 
tion is locked down and compete. 

This should be good news, especially 
in the Bring-Your-Own-Device world, 
where HTML5 and the accompanying 
W3C Web specifications have become 
the platform of choice for developers 
creating applications that must run on 
desktops, tablets and smartphones, 
with different operating systems and 
different form factors, all while main¬ 
taining an excellent user experience. 

Unfortunately, the specification is not 
locked down, as the W3C states. That’s 
because another group—the Web 
Hypertext Application Technology 
Working Group (WHATWG)—is also 
working to advance HTML. The 
WHATWG split off its efforts from the 
W3C back in 2004, when it believed that 


the W3C had given up on HTML to 
focus its efforts on XML and XHTML. 

In 2007, the W3C adopted the 
WHATWG specification as HTML5. 
And according to W3C communications 
director Ian Jacobs, the groups have 
been working together, and have agreed 
that WHATWG s work will be called the 
“living standard,” while the W3C s work 
will be called a “snapshot” of the specifi¬ 
cation, frozen at a point, put through 
review and comment, and then earning 
status as a W3C recommendation. 

The idea of a living standard, 
according to WHATWG editor Ian 
Hickson, is that it is constantly receiv¬ 
ing additions and being cleaned of 
bugs. Yet it is the rigor of the W3C’s 
process that makes a specification trust¬ 
worthy and reliable. 

This bifurcation is not good news for 
developers, their managers, or the indus¬ 
try as a whole. Now, browser providers, 


device manufacturers and application 
developers will have to consider which of 
these supports which standards. If you 
write to some of the new features in the 
“living standard” that aren’t yet part of a 
W3C specification, yet the browser or 
device doesn’t support them, the website 
or application will not render as intend¬ 
ed. And as we’ve come to learn, if an 
application’s performance or user experi¬ 
ence are degraded because of disparities 
in the specification, everyone loses. 

The groups forked off over XHTML. 
The W3C has since acknowledged that 
was the wrong path to take. The sides 
need to put their egos and control issues 
aside, and work to advance one true 
HTML5 specification. 

HTML5 has the potential to make 
the lives of developers creating cross¬ 
platform applications a whole lot easier. 
The specification itself is quite com¬ 
plex, both to implement within a 
browser, and also to target as a develop¬ 
er. Let’s not let the specification itself 
create unnecessary issues. I 


What we expect in 2013 


B reaking news headline: The world 
is changing. No, really. Who would 
have believed that at the end of 2012, 
Microsoft would be hailed as an innova¬ 
tor, Apple would be reeling from a 
Maps fiasco, Hewlett-Packard would 
be imploding, IBM would be quietly 
printing money, and Oracle wouldn’t be 
involved in a hostile takeover bid? 

Well, okay: We could have predicted 
some HP-oriented bad news. 

We will now go out on a limb with 
predictions for 2013. If we are correct, 
we expect to be lauded as true visionar¬ 
ies. If we are wrong, well, never mind: 

• Windows 8 will gain traction with 
consumers. Windows 8 is nothing but 
controversial, but by mid-year, we 
expect consumers to love the touch¬ 
screen capabilities of their desktops, 
notebooks and tablets. The user experi¬ 


ence is truly innovative, and for casual 
users, it will prove compelling. 

• Windows 8 will not gain traction with 
enterprises. Too much software, too 
much training, too little benefit: Enter¬ 
prises will stick with Windows 7, or find 
a way to disable the new user interface 
and stick with the Start button. 

• Windows Phone 8 will begin to make 
inroads, but Android will continue its 
lead, and iOS will fall behind. Unless it 
has a breakout product, Apple will be 
hurt by its endless stream of incremental 
upgrades. Meanwhile, Google’s aggres¬ 
siveness and the creativity shown by 
Samsung, LG and others will push 
Android forward. 

• The cloud will become so ubiquitous 
that we will stop talking about it. The 
novelty will wear off from doing builds 
in the cloud, using cloud-based source- 


code control systems, using a cloud- 
based IDE, testing through the cloud, 
leveraging cloud-based storage, or 
deploying into the cloud. It will simply 
be how software is done. 

• The same is true with agile ALM. 
Enterprise use of agile methodologies 
and application life-cycle management 
tools will be simply assumed. 

• Development teams will focus on test¬ 
ing. Whether driven there by mobility or 
by the cloud, we will see decreased cor¬ 
porate and customer tolerance for buggy 
or insecure software. Developers will get 
more test training, more testers will be 
hired, and more outsourced testing serv¬ 
ices will be launched. It’s about time. 

• The global economy will continue to 
recover, and investment in software (and 
software development) will lead the 
pack. 2013 will be an excellent year to be 
a software developer, development man¬ 
ager or software entrepreneur. 

Happy New Year! I 
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Editors Note: All of this issues letters 
are in response to “The Trouble with 
Gerrold: Windows Ate,” available at 
sdt.bz/37205. 

Who’s Windows 8 
made for? 

David couldn’t be more spot-on in 
pointing out the “desktops are for pro¬ 
duction, tablets for consumption” 
dichotomy. I have read a number of 
comments about how “awesome” Win¬ 
dows 8 is with a touch-screen. 

I spend hours everyday on sound 
recording/editing software. Just as a 
test, I pretended to run through a few 
simple tasks on my monitor, as though 
it were a touch-screen. After about 90 
seconds, I realized I would have to be 
holding both arms suspended in mid¬ 
air all day... or place a touch-screen flat 
on my desk, with my head tilted down 
in a cramp-inducing angle, and still 
have to move the touch-screen back a 
foot every time I needed to use a key¬ 
board-based program that was not 
made for Windows 8. 

Like David, I have used Windows 
since it was DOS on five floppy disks, 
and this is the first time I feel they have 
made the corporate decision that I am 


no longer their target market. Am I 
missing something here? 

Tom Kane 

United States 

Whippersnappers 
and their Windows 8 

I agree with you here, David, especially 
the part about production versus con¬ 
sumption. Should the gap be bridged? 
Remembering back to when DOS was 
removed from Windows, I had a bad 
feeling. I would have to really work to 
get to certain files, or to hand-remove a 
vims from a machine. I can see some¬ 
one in an office needing IT help, and 
not even knowing what files and fold¬ 
ers are. 

XP was just fine. We totally avoided 
the WTF that was Vista, especially the 
more we heard from high middle knowl¬ 
edge users about the weird things it did. 
We don’t want to spend all our time try¬ 
ing to get things transferred over, we just 
want to sit down and get to work. 

Now Windows 8. It boots fast? And? 
You assume I even turn off my 
machine... Another interface that may 
or may not be familiar? The iDon’t and 
the Droid interface changes with every 
upgrade. How many “upgrades” will we 


get under Win 8? Will we meet or 
exceed the XPSP2, or will we just 
rename it root beer, then sarsaparilla, 
and then start shaking our canes at 
these whippersnappers? 

We did go to Win 7 with the newest 
laptop and desktop, and we really like 
it. The TRUE plug and play, where it 
automatically finds what I need to 
print, that was neat. Not really being 
able to share two computers no matter 
how we give permissions ain’t so great. 
And of course, no help, since ROTH 
the online store and the big box stores 
do not send the Windows registration 
paperwork with the computer. So make 
the magazine cover as glossy as you 
want, Windows. Just know that there 
are some niggling matters in the back 
pages that still haven’t been addressed. 

Brandy 

United States 

A phantom too far 

Microsoft is chasing a phantom. 
They’re betting the farm on desktop 
and mobile interfaces converging. I 
believe they are wrong; the use cases 
for the two are sufficiently different 
that no single interface will ever be ide¬ 
al for both, and users don’t seem to be 
clamoring for unification. 

Touch-screen desktop monitors 
don’t really make sense, at least not for 
content creators. (A desktop computer 
that is basically used as a TV with ben¬ 
efits is another matter.) The typical 
desktop monitor is too big and too far 
away for a touch-screen to be anything 
but an ergonomic nightmare, not to 
mention the fact that serious desktop 
users don’t stop at one monitor. 

Shirley Dulcey 

United States 


What do you think? 

Letters to SD Times should include the 
writer's name, company affiliation and 
contact information. Letters become the 
property of BZ Media and may be edited. 
Send to feedback@bzmedia.com. 
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► The Windows 8 column that started it all 

David Gerrold kicked off quite a response with his column on the Windows 8 release. 
Can Microsoft bridge the gap between the desktop and mobile worlds? David 
doesn't seem to think so: “The desktop machine is designed for production, the 
tablet is designed for consumption. On your desktop, you edit and produce spread¬ 
sheets, databases, books, music, and video. On your tablet, you consume them." Do 
you agree or disagree? You can join in on the discussion at sdt.bz/37205. 

► Slow and steady wins the maps race 

The Apple/Google maps fiasco was a mistake for Apple, but it was also a show of 
how good Google's processes are, according to Chris Barylick. “Love or hate Google, 
it's come through with something worth mentioning here, and where Apple 
arguably rushed something completely new without adequate testing and 
feedback out the door, Google decided to bide its time in * jr 
its Mountain View lair and come out with something V 

commendable," he wrote. You can read more at gklWv 
www.sdtimes.com/blog/2127. 









Executive Dashboard January I - December 31, 


2011 


© 


Lfllt i Days 

.l a ■ 


Moritti-tD-Date 


Year-W-Daie 


<t799 SK $487.5K 


$10.5M %9AM 

.JS2JK (0.5*0 ' ■ '• : ' J ■' 


Sales Analysis 


Key Metrics 


$. 100 * 




Total Sales'. 

$10,575,084 




$70K 


From Target: x S92.3K U 
(ram PrcY Pctldd' ’ 132 JK ( 0 . 5 %) 






WQK 


$3tl* 




$1QK 


V 

...,fit V. train 9% 


Mai 


S3724K 

* SG. 4 K ( 9 %) 


OpE*: S646-9K 
t 4S3-2K (6%) 


S0.4KO%) 




componentart.com/windows8 


















Big Data gets reeil 
at Big Data TechCon! 

The HOW-TO conference for Big Data and IT Professionals 
Check out this list of classes! 


Untangling the Relationship Hairball with 
a Graph Database 

HBase Schema and Table Design Principles 

A Survey of Probabilistic Data Structures 

High-Speed Data Ingestion with 
Sharded NewSQL Databases 

Analytics Maturity Model 

How to Fit a Petabyte in Apache HBase 

Apache Cassandra — A Deep Dive 

Beyond MapReduce 

How to Integrate Structured and Unstructured Data 
with Avro 

Implementing a Real-Time Data Platform Using HBase 

Big Data Science: Extracting Truth from Large, 
Multi-structured Data, Parts I & II 

In-Database Predictive Analytics 

Building an Impenetrable ZooKeeper 

MapReduce Tips and Tricks 

Building Applications Using HBase 

Mastering Sqoop for Data Transfer for Big Data 


Building High-Volume Web Applications 
Building Successful Data Science Teams 

Extending Your Data Infrastructure with Hadoop 
Microsoft's Big Data Story 

• Data Modeling and Relational Analysis in a 
NoSQL World 

• Oozie: A Workflow Scheduler for Hadoop 

Distributed Search and Real Time Analytics, 
Parts I & II 

• Selecting the Right Big Data Tool for the Right Job, 
and Making It Work 

• Getting Started with Predictive Modeling: 

Simple Models and Basic Evaluation 

• Hands-on NoSQL for the DBA 

• Getting Started with R and Hadoop, Parts I & II 

• Seven Deadly Hadoop Misconfigurations 

• Hadoop Backup and Disaster Recovery 101 

...and LOTS more! 

Download the full Course Catalog TODAY 
at BigDataTechCon.com. 




A BZ Media Event 


Big Data TechCon™ is a trademark of BZ Media LLC. 
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Register Early and SAVE! 
Register by Feb. 22 and Save $300! 


Discover how to master Big Data from real-world 
practitioners - instructors who work in the trenches 
and can teach you from real-world experience! 
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Three scoops at AnDevCon IV 

Amazon, Facebook and Google give developers updates on their services 



Google's Romain Guy and Chet Haase go over new features in Android 4.2. 


BY SUZANNE KATTAU 

Whats new in Android 4.1 and 4.2, how 
to handle the challenges of scaling 
Android, and when it’s best to use 
HTML5, native or hybrid coding in 
Android development were among the 
topics that attendees were treated to 
during three AnDevCon IV conference 
keynote speeches in December. 

(AnDevCon is produced by BZ 
Media, which also publishes SD Times.) 

Google software engineers Romain 
Guy and Chet Haase discussed the 
changes in both Android 4.1 and 4.2. In 
4.2, they said nested fragments have 
been added, and developers now have 
more control over the animations that 
happen between activities. Navigation 
has also been made more robust. 

External Storage Access is a new 
permission that has been added in 4.1 
but is not yet required in 4.2; Guy and 
Haase recommended that developers 
start using it immediately anyway. 
Developers should check for this when 
they need to access external storage; if 
the user hasn’t enabled it, he or she will 
have to as it will be required as a secu¬ 
rity option for access to that storage. 

In 4.2, shape drawing and in-layer 
processing are now both faster. There are 
now widgets for wide-screen support, as 
well as enhancements to the Property 
Animation and LayoutTransition fea¬ 
tures. Google also added two new meth¬ 
ods (Start Action and End Action), which 
let developers more easily sequence 
ViewPropertyAnimator objects. 

And, finally, Guy and Haase said they 
worked hard on system-wide memory 
management, especially in the graphics 
subsystem, along with a new feature 
called database query cancellation. 

Facebook's challenging environment 

Facebook’s director of mobile engineer¬ 
ing Mike Shaver discussed how Face- 
book handles challenges of scale within 
the company. He stated that scaling 
Android successfully involves mastering 
multiple areas, including handling device 


and user diversity as well as app content. 

Shaver said that Facebook’s app is 
used in a lot of different ways, consider¬ 
ing that users have different content mix¬ 
es, preferences, networks and priorities. 
Because of how widely Facebook is used 
on mobile devices, and because of limita¬ 
tions on mobile networks, he said that 
Facebook’s data usage is really impor¬ 
tant. Carriers Facebook works with have 
told him that the app uses too much data. 

Facebook also deals with the scale of 
what Shaver called “diversity.” “Android 
presents a rainbow of software and 
hardware environments in which your 
application might find itself unexpect¬ 
edly,” he said. “We actually ran out this 
sort of chart of what our tail looks like. 
We take the most popular 100 Android 
devices that use our app, and we get to 
right about a third of our users—and 
then the tail gets ridiculously long.” 

Shaver also discussed hardware accel¬ 
eration on many devices. “If you hard- 
ware-accelerate the container in which 
your WebView appears, it will work pret¬ 
ty well and will get accelerated, but you 
will no longer be able to apply Trans¬ 
forms or scaling,” he said. 

Amazon wrestles with HTML5 

HTMF5 has performance challenges 
today, Amazon’s director of app develop¬ 
er services Ethan Evans said, so why use 
it? He said it’s because the benefits of 


rapid updates without requiring app 
upgrades outweigh the problems. He 
said there are two driving reasons to con¬ 
sider HMTF5: Write once, run any¬ 
where; and the public promise of cross¬ 
platform compatibility. 

Where is HTML5 best used today? 
Evans said it lends itself to more static 
or low-interactivity tasks, so developers 
will face more challenges when using it 
in things like games. High-interactivity 
apps are challenging places in which to 
use HTMF5, he said. 

As for native code, by definition it’s 
platform-specific, Evans said. If devel¬ 
opers plan to support multiple plat¬ 
forms, they will have to port. All 
updates will require upgrades as well. 

Eater on in his keynote, Evans went 
deeper into what Amazon has done in 
regards to HTML5 vs. native vs. blend¬ 
ed. He explained the Amazon AppStore 
Hybrid Architecture. Features were 
grouped from a practical standpoint. 
Presentation Logic and Business Logic 
were in HTML5; search bar, auto-com¬ 
plete and app DRM were in native 
code; and dialogs were in blended. 

Evans concluded with Amazon’s 
future, which entails an experiment with 
server-side rendering that involves 
developers preassembling pages to take 
the burden off the device CPU. Amazon 
said that the experiment so far shows 
promise. I 


Photo by Jay Kelly 







I www.sdtimes.com | January 2013 | SD Times | NEWS | 15 

W3C: HTML5 is ready for prime time 

Specification is now a candidate recommendation 


BY DAVID RUBINSTEIN 

HTML5 is locked down. Developers 
have a stable framework for the next 
several years and know what they can 
rely on in the specification, according to 
the World Wide Web Consortium 
(W3C) that oversees HTML5, part of 
the Open Web Platform. 

The timing of Decembers publica¬ 
tion of the complete definition of 
HTML5, and its announcement as a 
candidate recommendation, is impor¬ 
tant, according to the W3C’s head of 
communications Ian Jacobs, as devel¬ 
opers continue to build applications 
and websites with HTML5. Along with 
that announcement, the W3C also 
announced that HTML 5.1 is now a 
working draft. 

Jacobs said improved video on the 


HTML5 release train 


Web (driven by the television industry) 
and HD support for image functions 
are among the new features for the next 
version of the specification. And, the 
first draft of Canvas 2D Level 2, the 
drawing API that remains part of the 
HTML5 specification, was also 
announced. 

Jacobs cited a recent survey complet¬ 
ed by Kendo UI (a division of develop¬ 
ment tools company Telerik) that 
showed that 82% of about 4,000 devel¬ 
opers surveyed believe that HTML5 is 
important to their job within the next 12 
months. Further, 63% of developers 


responding to the survey say they are 
already actively developing with 
HTML5, due to familiarity of language, 
cross-platform support and perform¬ 
ance. Only 6% of respondents said they 
had no plans to use HTML5 in 2012. 

So, even as HTML5 gains stability, 
95% of developers responding to the 
Kendo UI survey said they have some 
level of concern about browser frag¬ 
mentation. Jacobs said the next big 
effort over the next few years will be to 
meet broad interoperability require¬ 
ments. “If you follow the data, there’s 
strong interoperability already,” he 
pointed out. 

“But the W3C s goal is not just inter¬ 
operability across browsers, but across 
devices used. There are lots of produc¬ 
ers of HTML in the world...content- 


management systems, e-mail clients. 
It’s a big effort; we have had support 
from lots of different industries. 

“It’s useful to note there are some 
80 companies in the working group, 
and there’s interest across the consor¬ 
tium for strong interoperability,” 
Jacobs continued. “The TV industry is 
keenly interested in a robust platform. 
They’re very keen on HTML testing. 
We have a modest test suite for 
HTML5, but we’re going to redouble 
our efforts.” 

Jacobs noted that the W3C contin¬ 
ues to collaborate with the Web 


Hypertext Application Technology 
Working Group (WHATWG), which 
formed because it viewed the W3C’s 
progress on HTML as too slow. So the 
W3C is working on the “snapshot” ver¬ 
sion of HTML, while WHATWG con¬ 
tinues to advance what it calls the “liv¬ 
ing standard.” 

In the announcement, the W3C 
“recognizes the work of Ian Hickson 
(from Google), who has been the editor 
of the HTML5 specification for nearly 
all of its existence.” Hickson is a part of 
the WHATWG effort. 

“It’s a complex and fascinating mix of 
things that lead to a standard,” Jacobs 
said. “A well-written document is 
important, a test is important. The 
W3C feels building a consensus is 
important. But actual implementation 
is important. The WHATWG group 
believes what is implemented is super 
important. But there are a lot of factors 
involved (to create a standard)...discus¬ 
sion, review, integration with other 
technologies, timing. Having innovators 
out there being able to work quickly on 
new ideas, seeing what’s being taken 
up, is a great approach. Sometimes 
there are big successes, and sometimes 
we miss as well.” 

As for the rest of the Open Web 
Platform, Jacobs noted progress in a 
number of different technologies. 

The CSS Working Group, for exam¬ 
ple, is looking at features that let appli¬ 
cations work well on tablets and APIs 
for device orientation. Also, the HTML 
Working Group has taken on work 
around responsive images, which will 
give the design community a better way 
to serve images in a website depending 
upon the context of the site, he 
explained. “It’s a useful complement to 
responsive design,” he said. 

The W3C in December also pub¬ 
lished a recommendation for the Web 
Open Font Format, which defines fonts 
and enables them to be embedded in a 
Web page. I 


The current HTML working group charter was issued on March 7, 2007. The group is 
chartered to continue its work through Dec. 31, 2014. 

Per the Plan for 2014, the milestones are as follows: 
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Note: The 2014 plan calls for a short Last Call for HTML5 in Q3 of 2014, prior to the move 
to PR in Q4 of 2014. 

Source: W3C 
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Putting the brakes on agile gone wild 

VersionOne returns focus to developers with TeamRoom 


BY DAVID RUBINSTEIN 

Agile development might have gotten 
away from itself. Originally intended to 
define how development teams can 
work in a responsive way—delivering 
software in shorter cycles and emphasiz¬ 
ing people over processes and tools— 
agile has exploded into the enterprise, 


with new tools providing visibility and 
actionability geared to managers and 
stakeholders, not only developers. 

With the fall release of its epony¬ 
mous software, VersionOne is taking a 
step back by introducing TeamRoom, a 
scaled-down, lighter-weight version of 
the software that meets developer 


needs first and foremost, while still pro¬ 
viding insights for project managers 
and data managers. TeamRoom ships as 
a part of the VersionOne product. 

“If I think about agile and go back 
five years, it was very much a developer 
and development-team phenomenon,” 
said Robert Holler, CEO of Ver¬ 
sionOne. “Rut the pendulum has swung 
over to management with a big M, and 
how do we transition some of that back 
so the team has a better environment 
[to work in]?” 

TeamRoom provides the storyboards, 
task boards and work-in-progress limits 
developers need, and lets them select 
and display only the information that’s 
important for them to complete their 
jobs, Holler explained. Team mascots 
and personal avatars help customize the 
view, and still provide managers with the 
visibility they need, he added. 

The focus of TeamRoom, Holler 



A work-in-progress violation helps developers see what limits they're working with. 
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emphasized, is the team. “How do we 
create an environment that’s tailored just 
for the team, almost at the expense of the 
rest of the organization? How do we 
remove the noise? How do we remove 
the additional overhead? How do we get 
rid of all the things they might not neces¬ 
sarily need to do their daily jobs, and give 
them just what they need?” he said. 

“When I log into my TeamRoom, all 
I’m looking at is my upcoming backlog 
and the items in a storyboard or task 
board that are in process. In context of 
that, can I see the communications 
going on with respect to those work 
items, and can I see what’s recently 
changed, and can I answer questions in 
my daily standup as to what’s happening 
and what I worked on in the last 24 
hours, and really turning that into the 
focus as opposed to the enterprise 
ALM focus.” 

Holler said this release differs from 
what’s been done in the past, in that past 
releases have been additive. “Let’s add 
this in, and let’s add this in, and let’s 
make it for bigger organizations, and let’s 
manage multi-project, multi-program, 


cross-departmental stuff,” he said. 

“We’ve been doing a lot of work with 
agile portfolio management [and other 
enterprise-level capabilities], and we 
believe those are absolutely important, 
but we also felt like there was another 
end to the spectrum that was absolutely 
important, if not core, to agile. At some 
point, someone’s got to start paying 
attention to that, and we decided that 
was going to be a point of differentia¬ 
tion for us and a focus for us.” 

The fall release also reflects additional 


enhancements made around Kanban and 
CA Clarity integration (for time report¬ 
ing), as well as some new visualizations, 
to which Holler said cannot be done jus¬ 
tice by speaking about them. “I’ll simply 
say that, ‘Hey, I’ve got a story, and that 
story came from an epic that came from 
a higher-level epic, that broke down into 
tasks and tests and has dependencies 
among other stories, and wouldn’t it be 
great to see that visually?’ That’s all I’ll 
say. You’ll see a picture and you’ll go, 
‘OK, I get it.’ ” I 
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Raspberry Pi: A complete PC 
for minimum computing 

Affordable Arduino-based hardware works for many devices 



The Raspberry Pi is a tiny ARM-based computer, which means it can run regular applications in 
a tiny form factor. 


BY ALEX HANDY 

Small processors on tiny circuit boards 
have been all the rage in the embedded 
community for years. The Arduino 
micro-controller has become the 
default prototyping environment for 
electronics and embedded users alike, 
but the platform has always been 
specifically targeted at existing embed¬ 
ded developers. But the Raspberry Pi 
has changed that paradigm. 

While the Arduino is based on a 
microcontroller, the Raspberry Pi is 
actually a complete PC, yet the board 
itself is no larger than a stack of credit 
cards. While users of the Arduino have 
to write all their software from scratch, 
a Raspberry Pi can run Linux or Win¬ 
dows, as well as any applications written 
for those platforms. 

That means building a webcam 
would be much easier on a Raspberry 
Pi, where you could install existing soft¬ 
ware and camera drivers, write a shell 
script to push the images to an HTTP 
directory, and host the whole thing up 
to the Web with Apache, all on a local 
Raspberry Pi. And best of all, the Rasp¬ 
berry Pi costs only US$25 a piece. 

The project began in 2009 with the 
formation of a non-profit known as the 
Raspberry Pi Foundation. The goal was 
to spark interest in computing and com¬ 


puter science in the classroom by pro¬ 
viding a low-cost platform for teaching 
those topics. 

Eben Upton, technical director and 
application-specific integrated circuits 
architect at Broadcom, is the founder 
of the Raspberry Pi Foundation. He 
said it all began at the University of 
Cambridge. 

“It was an attempt by a group of us at 
the University in Cambridge to reverse 
the decline in the numbers and skillsets 
of applicants taking computer science 
at the University,” he said. 
“Obviously it s grown beyond 
anything we could have imag¬ 
ined, in large part because 
of the interest from the mak- 
er/hacker community.” 

That enthusiasm has seen 
home users plug Raspberry 
Pis into their media centers, 
place them on remote-con¬ 
trolled vehicles, or use them 
in their home-security and 
automation systems. Because 
it is a full computer, there are 


no limitations on what it can be used for. 

“I think it eliminates a lot of the scale 
advantages in consumer and industrial 
electronics, and also brings embedded 
programming within reach of engineers 
with more traditional desktop or enter¬ 
prise skillsets,” said Upton. “Compared 
to a microcontroller, what advantages 
does the Pi offer? Much more memory, 
much higher performance, and the abil¬ 
ity to run a ‘real’ protected-mode oper¬ 
ating system.” 

And that opens up a lot of possibili¬ 
ties. Each Raspberry Pi has a built-in 
HDMI port, so they can easily be used 
to stream video across a network and 
onto an HDTV. Each Raspberry Pi also 
has a USB port and an Ethernet port, 
meaning each one can be attached to 
sensors and to the network. 

And while the Raspberry Pi contin¬ 
ues to evolve (the foundation is work¬ 
ing on more versions with different 
attachments and speeds), its certain 
that the platform will continue to 
adhere to the low price point and small 
form factor. I 



Media devices are particularly compatible with Raspberry Pi. 
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Riak offers cross-data-center replication 


BY ALEX HANDY 

Basho, the company behind the Riak 
key value store, announced in Decem¬ 
ber that the enterprise edition of Riak 
will offer cross-data-center replication 
capabilities. 

These new capabilities build on the 
existing Riak Cloud Storage product. 
This enterprise product wraps addition¬ 
al capabilities around Riak, allowing it 
to used through an API modeled after 
Amazons S3 storage service. Riak 
Cloud Storage is also multi-tenant, and 
can store objects by splitting them into 
smaller chunks and replicating those 
pieces across the database cluster. 

Riak is designed to function with 
multiple nodes, typically starting at five 
nodes. All data on those nodes is repli¬ 
cated at least three times. Because of 
Riak’s architecture and use of multiple 
nodes, a cluster is always available for 
reads and writes, with no locking or 
blocking taking place. 

But because Riak is a cluster-based 


database, replication across locations 
can be tricky. That’s why Basho added 
cross-site replication for entire clusters. 

Andy Gross, chief architect of 
Basho, said, “For the enterprise cus¬ 
tomers, a big global company can have 
data close to different continents, and 
for large service providers, they can use 
the multi-service capabilities of Riak to 
build regional zones.” 

Meanwhile, in the NoSQL market, 
distinct lines have formed along the 
database providers. Some, such as 
Couchbase and MongoDB, are gaining 
steam from developers while encoun¬ 
tering difficulties from IT. Others, like 
Cassandra and Riak, are being brought 
in from the IT side, and it’s the develop¬ 
ers who have to adjust. 

Shanley Kane, director of product 
management at Basho, said that develop¬ 
ers coming to Riak are generally strug¬ 
gling with the shift in concept that comes 
from moving from a relational database 
to a key-value store. “The biggest barrier 


to entry is to get people to think purely 
about keys and values,” she said. 

Gross added that, while Riak is a pure 
key-value store, it also offers some fea¬ 
tures that ease development. “One of 
them is search. You can make a Riak clus¬ 
ter look like an Apache Foundation Solr 
cluster from the client point of view,” he 
said. “You can write Map/Reduce jobs in 
Erlang or JavaScript, and run a distrib¬ 
uted query across that.” 

But there is one area that does cause 
trouble for developers: test environ¬ 
ments. Because Riak is designed to work 
with at least three nodes, it’s tough to fire 
up a test cluster on a local workstation. 

Erlang is in there because Riak is 
written in it. That’s one of its strong 
points, said Gross. And because of these 
technological underpinnings, Kane said 
that Basho has been an enterprise com¬ 
pany for some time. The deals Basho is 
closing are too expensive for startups, but 
she added that telcos and large business¬ 
es have been eager to embrace Riak. I 
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Embarcadero tool gains multi-device 
compatibility for native development 


BY DAVID RUBINSTEIN 

The old Borland development tool 
C++Builder received in December a 
new architecture called native multi¬ 
device development, along with a new 
name: C++Builder XE3. 

The product is the result of work 
that began in 2008, when Embarcadero 
purchased the CodeGear tool division 
of Borland, according to Michael 
Swindell, senior vice president of mar¬ 
keting and product management at 
Embarcadero. It was then that the 
company saw the potential of delivering 
applications on multiple devices, and 
the need to empower developers to do 
that efficiently and cost-effectively 

“Over the years, Windows has domi¬ 
nated the client landscape. All the way 
back in 1999, it was Windows 98 every¬ 
where; you wouldn’t see anything else,” 
Swindell explained. “Five years ago, it 
was Windows XP, but we also started 
seeing a lot of Web applications being 
driven by Java servers and Web servers. 
When it comes to client devices, though, 
pretty much Windows PCs were it. And 
that really follows what our products 
always have been about, going back to 
Borland and even until as recently as last 
year—pretty much a Windows focus. 

“But there’s been a change in the 
client landscape. We say it’s the client 
revolution, but it’s really the dominance 
of Windows as a single-client environ¬ 
ment is changing very quickly,” he con¬ 
tinued. “It’s something we really haven’t 
seen in the modern computing era. I 
was in a meeting [recently] over in Bris¬ 
bane, and there were three companies 
in the meeting, and we’re all working 
on the same document, the same data. 
There were two Lenovo Windows PCs, 
two MacBook Pros, two iPads...and a 
Motorola Xoom. And that was the client 
mix in the room. And there was not 
even a mention of it in the room. That 
is the world that we’re in now.” 

What C++Builder XE3 does, said 
Swindell, is enable developers to target 


multiple devices from a single C++ code- 
base. This is accomplished by the tool’s 
C++ compiler, which generates a native 
Intel application—not a wrapper appli¬ 
cation, he emphasized—that can be 
deployed to any device, and gives users 
the experience expected from the device. 

“For this new client device world, 
native is key to the preferred types of 
applications that users want to use,” 


Swindell said. “Virtual code platforms, 
like Java and .NET on the server, work 
well for that environment because the 
primary driver for those platforms was 
code safety and protection...because 
these are enterprise applications that are 
going to be running large amounts criti¬ 
cal data, with many users accessing them. 

“We’re making the point that native 
code is really the choice for the new 
client,” he added. “Java and .NET are 
great for server applications, with 
ASP.NET and various Web frameworks, 
but for these client devices, the user 
experience is critical, as well as the abil¬ 
ity to target all these devices. Java’s not 
available on all the devices, and .NET’s 
not available on all the devices. Native’s 
really the best way to target that.” 

Swindell said the release targets 
three platforms: Windows PC and Win¬ 
dows slate devices, and Mac, all running 


on Intel processors. Support for ARM- 
based devices is expected next year, after 
support for Android and iOS systems. 

The traditional approach to dealing 
with new platforms as they are deliv¬ 
ered has been to add another develop¬ 
ment team with another set of tools, 
and fund them through the revenue (or 
the opportunity for revenue) from the 
new product. But, Swindell noted, 


every time you add a team, tools and 
new technology, you’re adding to your 
scheduling, development time and cost. 

C++Builder XE3 is compliant with 
the C+ +11 standard that was just 
released, and Embarcadero has worked 
closely with the library standards groups 
on the update. Further, the company has 
added Embarcadero Standard Exten¬ 
sions to the tool, which lets developers 
using ANSI ISO C++ use the agile exten¬ 
sions to leverage such things as propertie 
and events as they would in C#, Delphi, 
Java or Visual Basic, Swindell said. The 
extensions also enable visual develop¬ 
ment and rapid prototyping, he said. 

Finally, C++Builder XE3 includes a 
64-bit compiler that the company said 
generates applications that can utilize 
more memory and data, and also access 
APIs, device drivers and system servic¬ 
es that are 64-bit. I 



Compatibility 

Borland C++, 
C++Builder, Clang 


Libraries 

STL, Boost, 
Loki, ACE 



Source: Embarcadero 


C++Builder XE3 supports Mac development, with ARM support on the horizon. 
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ComponentOne extends 
Windows 8 controls into suites 


BY DAVID RUBINSTEIN 

Saying that it is seeing as much interest 
in Windows 8 as it has in older plat¬ 
forms, Microsoft component solution 
provider ComponentOne has an¬ 
nounced support for the new operating 
system in its release of its control suite, 
Studio Enterprise 2012 v3. 

Chris Bannon, a product manager at 
ComponentOne (a division of Grape- 
City), said the company had the desktop 
covered with its Windows Forms con¬ 
trols, and was now extending Windows 8 
controls into its Studio for WinRT 
XAML and Studio for WinJS packages, 
which were announced at Microsoft’s 
BUILD conference last year. 

Bannon explained that Component- 
One’s control development centers 
around two codebases: a JavaScript 
codebase from which the WinJS con¬ 
trols are created, and a XAML codebase 
from which the WinRT and Windows 8 
controls are developed. “This gives our 
users clear migration paths, and makes 
it easier to target multiple platforms 
from the same codebase,” he said. 

Greg Lutz, another product manag¬ 
er at ComponentOne, said Microsoft 
provides system controls that offer the 
changes ComponentOne needs to 


develop its tools, but that on JavaScript, 
it’s a bit more difficult. Bannon 
explained that it’s somewhat trickier to 
do touch in the existing JavaScript 
Framework because there is no base 
control to inherit from. He added, 
though, that Microsoft has provided 
new event APIs for handling touch 
input, and that developers can even 
create their own gestures for use in an 
application. 

The company was also able to draw 
from its knowledge of JavaScript, and 
Bannon noted that JavaScript in Win¬ 
dows 8 uses many of the same standards 
found in JavaScript for Web applications. 

Lutz also pointed out that Compo¬ 
nentOne has created a number of con¬ 
trols for building Windows Store appli¬ 
cations, which require touch and pen 
input as well as the app bar and charms 
for navigation. Among the new controls 
from ComponentOne are those for 
touch-first charts and gauges, and new 
high-performance ListBox controls that 
are available for all XAML platforms. 

“We think for these kinds of applica¬ 
tions, C# developers will move from 
Windows Forms or [Windows Presen¬ 
tation Foundation] to XAML,” Lutz 
said, “rather than JavaScript/HTML.” I 


In other component news... 

Software development tool provider 
DevExpress has announced DevEx- 
press 12.2, a toolset that helps develop¬ 
ers build solutions for Android, iOS and 
Windows 8. The DevExpress 12.2 toolset 
is included in the company's updated 
Universal 12.2 Suite, which lets develop¬ 
ers target platforms such as ASP.NET, 
WinForms WPF and Silverlight, as well 
as HTML5 and Windows 8 XAML. 

Document, content and imaging solu¬ 
tion provider AccuSoft has announced 
ImagXpress v12, an imaging SDK that 
now has enhanced image processing, 
annotation, scanning and printing sup¬ 
port. Developers are now able to work 
with larger images due to the SDK's 
memory optimization enhancements. 
Other optimizations include speed-opti¬ 
mized insertion and deletion of TIFFs 
for processing large, multi-page TIFF 
images. 

Database connectivity solution 
provider Devart has introduced new 
versions of its Delphi Data Access 
Components and VirtualTable. The 
new versions support RAD Studio XE3 
Update 1 and C++Builder 64-bit. 
Whereas previous Devart Data Access 
Components only allowed application 
development for Win32 and Mac OS X, 
using its new Delphi components and 
C++Builder 64-bit, developers can now 
create native database apps for Win64 
as well. 

Ul development tool provider Infrag- 
istics has debuted Indigo Studio Ver¬ 
sion 1, a free software design tool that 
lets Web, desktop and mobile app devel¬ 
opers design functional, interactive and 
animated Ul prototypes. Incorporating 
real-world context into designs can be 
done using the tool's integrated story¬ 
boards. Developers can design their Uls 
using more than 21 built-in interactive 
controls, 300 searchable icons, com¬ 
mon and curve-based shapes, and vec¬ 
tor-based stencils. The tool also lets 
developers annotate designs and share 
those designs with others. I 
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THE YEAR IN REVIEW , 27 


BY ALAN ZEICHICK 


"W" TT "That a year! As we look back on 
% / % / 2012 s highlights, we also look 
T ▼ forward to a time when ITs 
control will seem weak, HTML5 rules 
mobility, Microsoft surges, IBM and 
Google stand pat, and Apple retrenches. 

The year that was 

It was the Year of the Cloud yet again, 
although the consumer-facing word 
“Cloud” began to give way to the more 
specific concepts of SaaS, PaaS and 
IaaS. To the general public, anything 
you access on the Internet is now 
“Cloud-based,” whether it’s e-mail, 
Twitter, gigabytes of free storage and 
backup services, or your favorite retail¬ 
er’s shopping cart. 

If Cloud is vague, well, so are SaaS, 
PaaS and IaaS. Amazon EC2, Sales- 
force.com, Google, VMware, virtual 
private Clouds, software-defined net¬ 
works—they’re all merging and branch¬ 
ing, like the crowds doing the dance in 
the surprisingly catchy “Gangnam 
Style” by South Korean artist PSY. (No 
technology conference in the second 
half of 2012 was complete without at 
least one flash mob doing the dance.) 

Beyond Cloud, the other word that 
defined 2012 was “App.” There was an 
app for everything, and increasingly, it 
was written to run on Android, and was 
written as either pure HTML5 or as 
hybrid HTML5/native code. By mid¬ 
year, Android’s takeover of the market 
was complete, and while pundits con¬ 
tinued to look toward Apple for innova¬ 
tion, post-Jobs snafus brought unwant¬ 
ed negative press. 

From a relatively modest upgrade 
with the iPhone 5, iPad Mini and iOS 6, 
to a well-publicized fiasco with Apple 
Maps’ massively flawed geo database, 
the company’s only good news came 
from a strong stock price and a court vic¬ 
tory over Samsung regarding patents. 

Indeed, with Microsoft’s release of 
Windows 8 and Windows Phone 8, it 
was clear that Apple wasn’t the only 
innovator in town. Love or hate the new 
Metro—ahem—Windows App Store 
user metaphor, with live tiles and 
touch-screens everywhere, it’s creative 


and fun. For the first time in ages, 
Microsoft was hot and Apple was not. 
Go figure. 

The year that will be 

What’s hot for 2013? Mobile, of course, 
but in many ways it doesn’t matter 
which platform. The completion of the 
HTML5 specification at the end of 

2012 means that developers should 
consider HTML5 first for all but the 
most performance-intensive apps—and 
even then, use native code only where 
necessary. Platform wars? Game over. 

A bigger battle is brewing in the 
enterprise, where Bring Your Own 
Device is reaching unstoppable 
momentum, whether IT likes it or not. 
The only remaining question is how to 
ride this new wave. 

A big player in all spheres is Big Data. 
The phrase has become ubiquitous for 
everything from predicting sales to 
tracking criminals. While consumers fret 
about their personal data becoming part 
of Big Retail’s Big Data, every business is 
trying to find ways to capture, store, 
process and leverage structured and 
unstructured data. The challenges aren’t 
new, but in 2013, words like “Hadoop” 
and “Cassandra” will be heard in nearly 
every boardroom. 

New too for this year will be main¬ 
stream acceptance of APIs. No, not the 
traditional application programming 
interfaces that we found in Linux, Mac 
OS and Windows; those are so 15 min¬ 
utes ago. No, today’s APIs are what we 
used to call Web services. Remember 
those? Yesterday’s RESTful Web Servic¬ 
es are tomorrow’s Web APIs. It’s code 
reuse by yet another name. 

Developers continue to be central to 
corporate innovation and agility, and 

2013 will be the year that mainstream 
developers realize the benefits of build¬ 
ing in the Cloud. This year, managers 
will invest in DevOps, the intersection 
of the Cloud, Agile and ALM. What will 
they be building? Mobile apps of 
course, but software of every stripe will 
be affected by the transition to Agile 
Cloud-based computing. And they’ll do 
it Gangnam Style. I 
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Open source went deeper into the enterprise 


BY ALEX HANDY 

Ten years ago, if you were working on 
an open-source project, you probably 
hosted it yourself. At the most, your 
team may have used SourceForge for 
storing your project code. 

But today, there is only one 
name in open-source soft¬ 
ware project repositories: GitHub. 

Throughout 2012, GitHub consistent¬ 
ly played host to the biggest, most com¬ 
plex and most useful open-source proj¬ 
ects. Relative newcomers to the 
open-source scene, such as Twitters 


Bootstrap, Raphael and Phusion Passen¬ 
ger, are all gaining popularity with both 
users and developers. But what is it 
about GitHub that makes it different 
from SourceForge? 

The answer is the social 
aspects. GitHub mines its 
data to show which projects 
are popular, which projects have just 
been updated, and which projects are 
seeing increased activity. It makes it 
much easier to check the pulse of a par¬ 
ticular project. And because popular 
projects like the Linux Kernel and Ruby 


The Year in 

w Open Source 


on Rails are already hosted there, its a 
sure bet that some of the best coders in 
the world are checking GitHub every 
day, if only to work on their own projects. 

Of course, just because you’ve posted 
a patch on GitHub and attached a pull 
request doesn’t mean your code is get¬ 
ting into the kernel. A lengthy exchange 
between Linus Torvalds and the rest of 
the Linux community took place this 
past August. It turns out that Torvalds 
only accepts pull requests done in Git 
proper, not those that exist on GitHub. 

continued on page 30 ► 


Mobile: New OSes, apps crossing platforms 


BY SUZANNE KATTAU 



If you’re a fan of new mobile devices, 2012 did not disap¬ 
point. All three major mobile companies (among others) 
came out with significant updates to both OSes and device 
hardware. And on top of that, numerous services came 
along to further bolster those platforms. It was a year of 
progress at the nuts-and-bolts level. 

But not all went smoothly. In August, Apple won its law¬ 
suit against Samsung, getting US$1 billion for patent 
infringement. In September, Apple dropped Google Maps 
from iOS 6, partnering instead with TomTom for the new 
Maps app. But Apple faced much criticism over the new 
app, which was perceived as being inferior to Google Maps. 

Despite that, Apple also introduced iPhone 5 and iOS 6 
in September, as well as the iPad mini in October. There are 
no signs yet that Apple’s continuous pace of hardware and 
software releases will abate anytime soon. 

In June, the new version of Google’s Android OS, 
Android 4.1 Jelly Bean, was introduced at Google I/O. Per¬ 
haps more importantly, Android reached a milestone in 
October when the Android-based Samsung Galaxy S3 out¬ 
sold the iPhone 5, marking the first time an Apple smart¬ 
phone has been outsold. 

In October, Microsoft released Windows 
8. Normally that’s big news for just the 
desktop, but the company’s vision for its 
marquee OS now includes mobile 
devices. Will 2013 see this 
HI concept embraced by 
developers and con¬ 
sumers alike? 

RIM entered 
the year bat¬ 
tered, but it 
was deter¬ 


mined to show it’s not dead yet. In June, it began its world¬ 
wide BlackBerry 10 Jam World Tour to introduce new 
SDKs for developers interested in developing for the 
upcoming BlackBerry 10 device. So far, it seems to have 
stopped RIM’s decline. 

On the software side, we saw development platforms 
introduced to help developers create apps for multiple 
mobile platforms. In May, Anywhere Software and Xamarin 
introduced Android development tools to allow Visual Stu¬ 
dio designers unfamiliar with Java to 
learn how to create and deploy apps for 
Android. In September, DevExpress 
tool that lets Visual Studio developers 
build apps for Android, iPad, iPhone and Windows 8. In 
October, Icenium, a browser-based integrated cloud envi¬ 
ronment from Telerik, was introduced to address this grow¬ 
ing trend of developers building cross-platform mobile apps 
using CSS, HTML5 and JavaScript. Other cross-platform 
mobile development frameworks came from Appcelerator, 
Infragistics, PhoneGap, RhoMobile and Sencha. 

In September, Facebook CEO Mark Zuckerberg 
declared that Facebook’s biggest mistake was using HTML5 
for mobile development. This reignited a still-ongoing 
debate as to whether or not developers should go native or 
use HTML5 when building mobile apps. Expect this debate 
to continue well through 2013. 

In June, at IBM’s Innovate 2012, cloud computing’s role 
in mobile was discussed at length. Mobile Backend-as-a-Ser- 
vice emerged in 2012, which is a cloud-based set of services 
that provide developers with customizable back ends for 
mobile app platforms so that developers can focus their time 
and energy on the apps themselves. Scalability and automat¬ 
ic RESTful API generation are two of the most important 
features mobile app developers can expect from mobile 
Backend-as-a-Service. I 
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◄ continued from page 28 

While a minor distinction, this caused 
some uproar in the community, as devel¬ 
opers finally understood why their 
patches were continually ignored. 

Having all of those eyes in one place 
helps to make GitHub the center of the 
open-source universe. But GitHub s 
charm isn’t entirely about its “Hub.” A 
lot of the draw is Git itself, and 2012 
was the year that commercial applica¬ 
tion development tool vendors finally 
realized this fact. 

That’s why almost every major reposi¬ 
tory vendor found some way to integrate 
Git this year: Atlassian and Perforce both 
now offer services and support for it. 
Elsewhere, Git gained better OS X sup¬ 
port through open-source projects. 

It was business as usual at the Apache 
Foundation, as the Hadoop project con¬ 
tinued to pull the most commits from the 
development community. Hadoop itself 
didn’t evolve much during 2012— 
Hadoop 2.0’s major changes are still 
being planned for a nebulous release 
date. But the Hadoop ecosystem saw 
great leaps forward, with Apache-hosted 
projects like Aero, Pig and Cassandra all 
expanding capabilities. The key-value 
store Cassandra, in particular, spent the 
year getting closer to Hadoop through 
support for running Map/Reduce jobs 
against a Cassandra cluster instead of 
against an HDFS cluster. 

But no discussion of 2012 could be 
complete without NoSQL. Last year, SD 
Times attempted to drain some water out 
of the NoSQL swamp by offering a 
stricter definition of it. For our readers, 
NoSQL means high-performance data¬ 
bases—either key-value stores, JSON 
stores, or other non-relational datastor¬ 
age systems designed for speed of devel¬ 
opment and ease of scalability. 

MongoDB is seeing success with lone 
developers, or with developers rapidly 
building single-tier Web applications. 
MongoDB took a number of hits in 2012, 
as many developers discovered just what 
it can and cannot do. One thing it cannot 
do is scale easily, and 2013 should see 
lOgen addressing this problem. 

Couchbase is the result of the merg¬ 
ing of Membase and CouchDB. This 
was the last year CouchDB’s founder, 



Clouds reshaped the Web 

BY ALEX HANDY 

You may not have noticed this while you were checking your favorite websites 
each morning, but the Web changed dramatically in 2012. According to Mary 
Meeker, partner at venture capital firm Kleiner Perkins Caufield Byers, in May 
of 2012, mobile traffic to websites surpassed desktop traffic. 

That shift in traffic focus pushed many companies to find solutions for their 
mobile application strategies. For many, the solution has 
been to build internal APIs upon which mobile applica¬ 
tions can be built. This was the tactic of Netflix, which 
offers a single API for accessing its giant store of movies and TV shows. End¬ 
point clients are simply a natively written window into that API, meaning every 
supported platform is simply an API-receiving dumb terminal. 

2012 was also the year Platform-as-a-Service grew up. Public cloud compa¬ 
nies like AppFog, CloudBees and Piston Cloud pushed their hosting platforms 
as a solution to development woes. 

CloudBees, for example, offers online build-and-deploy tools to give devel¬ 
opers not only a place to put their applications, but also a workflow to continu¬ 
ously integrate those applications into a server environment. Piston Cloud, on 
the other hand, is attempting to build a cloud-hosting business based on cutting- 
edge technologies like Ceph and OpenStack. 

But enterprises have long insisted that PaaS won’t work for them. They need 
an on-premise solution, and that’s why companies like Backspace, Red Hat and 
VMware have all moved toward offering products in this space. 

VMware’s Cloud Foundry came into its own in 2012. In the spring, the compa¬ 
ny announced the release of a version of its PaaS designed for local development. 
Cloud Foundry Micro Edition allowed developers to spin up a local instance of 
the PaaS, and to test their applications locally, without needing to upload code to 
a server. 

Red Hat, on the other hand, built its Open- § w 
Stack-based PaaS, OpenShift, for enterprise 
users who don’t feel comfortable living entirely in some¬ 
one else’s cloud. OpenShift was announced earlier this 
year, and the company released a newer version in - w y 
November that offered more enterprise features. 

2012 was also the year that OpenStack got serious, as 1 jt 
indicated by its newly formed governance organization and 
board of directors. The technical side of the project saw great \ 
advances as well, as the platform received some of 
the first functional components of Project Quan- jt 
turn, which seeks to build software-defined ** Jf 

networking tools for OpenStack; and Proj- “ « 
ect Swift, which is an effort to build an ISO 
repository for storing applications and their environ 
ments in a maintainable way. I 




Damien Katz, would work on that proj¬ 
ect. Now at Couchbase, he and the 
development team have been working 
on a fork of the project specifically tai¬ 
lored to the Couchbase platform. 

For mobile developers, the Android 
platform moved up to jellybeans. The 
release of Android 4.1 in the summer 


saw many new features added to the 
mobile device platform, including a 
faster interface paradigm, and the transi¬ 
tion within the OS and the Google Play 
store from non-protected binaries to 
protected binaries. This change should 
help curtail the rampant piracy rife with- 


l the Android community. 
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Developers took responsibility for security 


BY SUZANNE KATTAU 

Criminals increasingly breached enter¬ 
prise networks through mobile, Web 
and third-party apps last year. The 
cloud and its inherent multiple environ¬ 
ments often left backdoors unintention¬ 
ally open, which made 
them even more enticing 
to criminals. 

Because of this, software develop¬ 
ment managers had to begin testing 
their apps as thoroughly as IT tests its 
security infrastructure. Those changes 
affected the role developers were 
expected to play in app testing, but their 
efforts in this area were often deemed 
lacking. 

The year began and ended with 
developers still being seen as not doing 
enough to secure their programs, accord¬ 
ing to two surveys by Veracode and 
CAST. Veracode, a cloud-based app 
security company, tested 9,910 apps and 
found that 80% didn’t pass its security 
standards (which include a zero-toler¬ 
ance policy for cross-site scripting and 


SQL injections). Veracode s survey found 
that SQL injection threats were present 
in 32% of all Web apps tested, and 68% 
of apps had scripting vulnerabilities. 
CAST, a software analysis and measure¬ 
ment firm, found that security vulnera¬ 
bilities are not limited by 
programming languages. 

If you were a developer 
who would have loved to better incor¬ 
porate security into your app but felt 
you just weren’t equipped to do so, you 
weren’t alone. In September, the lack of 
security tools suitable for developers 
was cited by Forrester as among the 
reasons why developers were still not 
using secure software development 
practices. Also, developers still needed 
to better integrate security into their 
development practices from the earliest 
stages, according to development test¬ 
ing tool provider Coverity. 

To help developers find and fix code 
defects during development, some ven¬ 
dors came out with tools. In October, 
Coverity released a new testing tool to 


help developers with software security 
issues. The company announced Cover¬ 
ity Test Advisor, a change-impact analy¬ 
sis tool within its newly expanded 
Coverity Development Testing Plat¬ 
form. The Coverity Test Advisor tool 
alerts developers to high-risk changes 
in code that occur during development, 
and can identify traditionally untestable 
issues, which the company said is basi¬ 
cally anything that can’t be identified 
through functional or performance test¬ 
ing in QA. 

The Bring Your Own Device trend 
kept growing last year, and brought with 
it more security concerns for companies. 
In late February, the RSA Conference, 
which annually covers IT security issues, 
had everyone talking about the BYOD 
trend and externally controlled devices 
accessing internal networks. The securi¬ 
ty conference also focused on how com¬ 
panies can protect their networks from 
the threat of activist reprisals from the 
Internet hive mind known as “Anony¬ 
mous,” which tries to take advantage of 
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application vulnerabilities such as SQL 
injections. 

The RSA Conference also covered 
security appliances and cloud-based 
security solutions, with software-quality 
validation vendor Veracode discussing 
how some companies are using it as a val¬ 
idation service against third-party appli¬ 
cations their employees are using on 
mobile devices. Veracode also discussed 
how some companies are validating 
externally created apps that use its secure 
APIs to ensure security compliance. 

In August, Information Security 
Forum vice president Steve Durbin 
shared how organizations can protect 
themselves from cybercrime, and how 
software development managers can 
handle security issues. From a software 
development standpoint, he said man¬ 
agers have to look at whether they’re out¬ 
sourcing some of their development or 
whether they’re doing it all in-house. 
Outsourcing software development 
demands that they set in place certain 
checks to make sure that the code that is 
coming back has been thoroughly tested 
to their satisfaction. 

As a software organization builds its 
apps, it creates intellectual property (IP). 
Depending on the company, IP can be 
items that provide a competitive advan¬ 
tage such as proprietary trade secrets, 
algorithms in source code, or any unique 
characteristics of a product. Protecting 
that IP within the enterprise (as well as in 
distribution) from being stolen was a 
growing security-related issue that came 
into public scrutiny during Microsoft’s IP 
tussle over Android. Microsoft had given 
Android device manufacturers two 
choices: Sign Microsoft licensing agree¬ 
ments or risk being sued for patent 
infringement. Where some saw 
Microsoft’s policy as an anti-competitive 
play against Android, Microsoft’s position 
was that it was simply protecting its intel¬ 
lectual property. , 

And finally, in September, software 
development managers were given 
advice from a variety of experts on what 
they could do to keep intruders out of 
their own company’s code and licens 
es, which is knowledge that they 
can take with them well into 
the new year. I 


A slow year for Java 
was a welcome reprieve 

BY ALEX HANDY 

It’s been a rough couple of years for Java. With the acquisition of Sun Microsys¬ 
tems by Oracle, Java’s future was everything but certain, and after years of stag¬ 
nation and falling behind on the Web, the language was looking a bit outdated 
in 2009. 

But that was then, and this is now. Java has gone back to being a reliable old 
tool in the box, rather than the soap-opera poster child for process-locked com¬ 
mittees. With the OpenJDK pushing toward version 8, and the Java ecosystem 
back in full bloom, the development world can return to regular work, instead 
of reevaluating existing investments every time Sun put in a bad quarter. Plus, 
Oracle has proven itself to be a good steward of the language, resisting the 
temptation to bog down the JCP with Oracle-specific features. 

The long, painful Java pause of the late aughts is over. Back to work, folks. 

If there’s anything left in the Java world resembling a soap opera, it’s to be 
found inside VMware and at the Eclipse Foundation. VMware’s SpringSource 
acquisition of 2009 has, essentially, gone sideways. Instead of releasing time-sav- 
ing tools like Boo, or improving the Spring framework to 
keep up with a changing Web environment, SpringSource’s 
core has been pushed over to CloudFoundry, VM ware’s 
Platform-as-a-Service offering. As such, Spring has essentially been left on the 
vine to ripen based on community efforts, not VMware s investments. 

At the Eclipse Foundation, the move from the 3.x to the 4.x version of the 
underlying Eclipse platform was a bumpy one. Performance and incompatibil¬ 
ity issues kept the Eclipse lists ablaze with complaints and discussions on how 
to fix them. And, as we enter 2013, this is the primary hope for the IDE: When 
will it be usable again? Of course, some would say it’s fine now, but what are 
developers if not opinionated about their tools? 

Of course, Java remains the runtime of choice for almost half of all enterpris¬ 
es. As such, other languages continue to take root on the JVM. Last year, how¬ 
ever, the Web development framework known as Play gathered a head of steam 
as an answer to Web application woes for Java developers. Play is designed to 
use Scala, but many developers are finding that it works well enough with exist¬ 
ing Java infrastructure that it still speeds up Java development. 

So while the Java language itself didn’t experience much change over the 
course of the year, the Java ecosystem as a whole has returned to its prior 
vibrancy. It turns out that while Oracle has scared many away from MySQL to 

MariaDB, from Hudson to Jenkins, and 
from OpenOffice to LibreOffice, 
there’s one open-source project it’s not 
going to squeeze for every penny: the 
OpenJDK. That’s because this was 
also the first year in which 
Oracle truly showed how it’s 
s f going to make money from 
the language and platform: by 
selling you middleware. I 
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The year that agile tipped the scale 


BY SUZANNE KATTAU 

Application life-cycle management (ALM) focused much 
on agile development methodologies in 2012. 

There was increased discussion last year about how to 
scale agile for larger teams and projects, as well as what the 
new term “agile portfolio management” means. We also saw 
enhanced tools for Scrum as well as for lean/Kanban. Other 
ALM topics in the spotlight included continuous build/inte¬ 
gration, continuous delivery, where testing fits in, and how 
often developers should test. 

Early in the year, CollabNet brought Git (for code manage¬ 
ment) to the enterprise, and its announcement came shortly 
after HP announced its ALM integration solution. Both pieces 
of news reflected a growing trend among developers: They 
want to keep their variety of tools but still have a single place 
of traceability and visibility through 
the application life cycle. 

In August, this new theme of 
how to scale agile was one of the 
topics introduced at the Agile 
2012 conference in Dallas. In his 
keynote address, Stanford profes¬ 
sor Robert Sutton discussed how to 
scale agile successfully, explaining 
how a shift in mindset is important to 
do so. He also discussed agile portfo¬ 



lio management, which is a new term that describes how com¬ 
panies are viewing agile not just from the developers point of 
view, but as a way of doing business. 

Yet later in the year, VersionOne took a step back, 
released a scaled-down version of its management software 
for use in teams. According to CEO Robert Holler, agile got 
ahead of itself somewhat, and TeamRoom brings agile back 
to where its creators intended: to the development teams. 

As popular as agile is, there is 
mrviiiiM sti11 much confusion surrounding 
how to do it successfully. In July, a 
survey by research firm Voke revealed that many organiza¬ 
tions are still diving into agile without clearly understanding 
it. For example, many companies use “Scrum” as a catchall 
term for any agile practices or approaches they have adopted 
without understanding what it is before implementing it. 

To help developers understand agile and how to imple¬ 
ment it, Emergn announced what it said was the industry’s 
first work-based learning program for agile development. Its 
Value Flow Quality Education program teaches agile, lean 
thinking and practices to developers. 

Agile developers were also expected to play a bigger role in 
securing their apps last year. SAFECode, an industry group, 
offered guidance on how to do so. In July, it issued a paper to 
developers listing 36 practices to help them reduce software 
security flaws. I 


What didn't Microsoft do to regain ground? 


BY DAVID RUBINSTEIN 

Windows 8. New Surface and phone 
devices. The .NET Framework 4.5. 
SharePoint 2013. Office 365. Yammer. 
Updates to Windows Server, Team 
Foundation Server, Win¬ 
dows Azure, SQL Server and 
Visual Studio. 

What DIDN’T Microsoft do in 2012? 

For a couple of years now, Microsoft 
has been laying out its vision of comput¬ 
ing in the cloud, with a user experience 
that surpasses the desktop, and develop¬ 
er tools to ease the creation of new appli¬ 
cations for this paradigm. And in 2012, 
that vision became (largely) reality. 

It began with the release of Windows 
8, an operating system that marked a sort 
of break with the past. The concept of 
live tiles was introduced to replace icons 
on the desktop, where news feeds, calen¬ 
dar updates, photos and more could be 


viewed in a tile without even opening the 
application. Touch, too, became a big 
part of the operating system; a demo at 
2012’s TechEd showed a person working 
on a device using touch, a mouse and 
keyboard, as well as a stylus 
to write directly into an app 
such as OneNote. 

But it was not without a few missteps. 
The company called the new UI “Metro” 
style, and had to backpedal from that 
name when a small tech company pro¬ 
duced a patent. Now, its new, wonky 
name is Windows 8-style applications. 
Also, Microsoft released Windows 8 
devices with an option to revert to Win¬ 
dows 7, showing that it remains guilty of 
offering too many options. Instead of 
leading, the company is accommodating, 
which leads to poor performance. 

For developers, the Visual Studio 
2012 update was important if not revolu¬ 


tionary, introducing a new emphasis on 
“continuous quality” with new test capa¬ 
bilities, ALM tools for SharePoint devel¬ 
opment, and additional features for agile 
development. The new Windows Store is 
where developers can sell their apps and 
find their riches. For Office 365, Share- 
Point 2013 and SharePoint Online devel¬ 
opers, the Office 365 Marketplace is 
open for business, but devs must learn 
the new development model first. 

There was, though, one big thing 
Microsoft did not do last year: Hold its 
position among technology companies, 
falling to third behind Apple and Google 
in market capitalization. But, as Apple 
struggles to regain its footing in the wake 
of Steve Jobs’ death in 2011, and as 
Google’s Android, tablet and laptop ini¬ 
tiatives did not blow industry pundits 
away, Microsoft might have made up 
much ground in the device wars. I 
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A safe migration 


No need to worry about losing 
compatibility with .NET 
while upgrading 





• 4 . 

V* BY PATRICK HYNDS 



T here are some people who always have to be 
on the bleeding edge of technology, such as 
for Visual Studio and the .NET Framework. 
But this is not the case for most developers, and that 
goes double for development teams that have to 

deliver solutions according to a schedule, however unreasonable that 
schedule might be. 

Keeping up with every single release takes a great deal of time and 
effort. It is easy to skip a version, then two, and before you know it the 
articles and screenshots no longer make sense, and the assumptions 
about things make even plain English inscrutable. 

It is important to get back on track by understanding what you are 

continued on page 38 ► 
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missing and where to look when ramp¬ 
ing up. The latest version of the 
Microsoft development platform 
enables new things to be done, but 
there are also things that cannot, or can 
no longer, be done with it. There is 
nothing quite as frustrating as finishing 
a project and realizing that your favorite 
deployment mechanism is no longer 
supported (hint hint). 

The very best reason to move to 
Visual Studio 2012 without delay is 
that Visual Studio 2012, like Visual 
Studio 2010 before it, allows you to 
target the version of the .NET Frame¬ 
work that you want instead of only the 
one that ships with Visual Studio. To 
this point, David Yack, CTO of Col¬ 


that you have a couple people on the 
team lead the way and use it for a cou¬ 
ple of weeks, so when the rest of the 
team jumps in, they are not all learn¬ 
ing and can have people on the team 
to ask how to do specific tasks. The 
ability to install and try out multiple 
versions of Visual Studio side by side 
makes it fairly inexpensive to take this 
approach. 

C++ renaissance 

When Microsoft first starting hinting 
at the developer technologies in the 
wave that would accompany the 
release of Windows 8, there were 
some mysterious statements at the 
time concerning C+ + . The promise 
was that C++ would become a first- 


Much ground has been made 
up with C++ because 
Microsoft wanted to widen 
the scope of Windows 8 
app developers. 

* ^ * 




orado Technology Consultants (a Col¬ 
orado-based Microsoft consulting 
firm), said, “I think it is important to 
separate moving to Visual Studio 2012 
and moving to .NET 4.5; we have 
moved to Visual Studio 2012 as fast as 
possible. Moving to .NET 4.5 has been 
slower, because unless you need a fea¬ 
ture it offers, there really is not the 
same motivation.” 

The freedom to target different 
framework versions allows his firm to 
take advantage of the little changes that 
make for better productivity. He main¬ 
tains that even faster startup or small 
feature changes in the editor make it 
worth moving as early as projects and 
schedules permit. 

Some organizations require follow¬ 
ing the “first service pack rule” to 
ensure stability before adopting any¬ 
thing new, but even those organiza¬ 
tions can tamp down the learning 
curve by starting familiarization early. 
If you go this route, Yack suggested 


class citizen in the family of languages 
in Visual Studio, along with Visual 
Basic and C#. For a decade, the tools 
have improved for Visual Basic and C# 
in a relentless and often competitive 
drive between the teams that support 
each language. The reality is that in 
Visual Studio 2012, much of the 
ground has been made up with C+ + 
because Microsoft wanted to widen 
the scope of potential developers for 
Windows 8 applications, and make 
sure that great performance was 
attainable. 

To understand just how far C++ has 
come from behind, I discussed the cur¬ 
rent situation with Kate Gregory of 
Gregory Consulting. She is the author 
of “C++ AMP: Accelerated Massive 
Parallelism with Microsoft Visual C++,” 
as well as a number of C++ and Visual 
Studio courses available online at Plu- 
ralsight.com. When asked why her 
company is using Visual Studio 2012, 
she said, “Most of our new develop¬ 


ment at the moment is in C++, native 
code, not .NET. And for these projects, 
Visual Studio 2012 brings us a number 
of very important C++ 11 features, 
along with C++ AMP and support for 
developing apps that target the Win¬ 
dows 8 store, Windows RT and Win¬ 
dows Phone 8.” 

She explained that C++ 11 is like a 
whole new language, an evolution of 
C++ that has made the resulting lan¬ 
guage readable, safe, easy to use and 
indisputably the fastest at runtime. 
Knowing that I would be slow to 
believe that C + + is now easy, she 
emphasized that pointers and many of 
the harder artifacts of old-style C and 
C++ are no longer parts of the picture. 

The claims of better performance 
with C + + have always rung true 
because there is the tradition of C+ + 
being closer to the metal with fewer 
layers of abstraction between the pro¬ 
grammer and the hardware. By all 
accounts, this is still true today, but the 
real story is the performance gains to 
be realized with C++ AMP, which Gre¬ 
gory pointed out “can speed up parts of 
your application up to lOOx by moving 
the parallel work into the GPU.” 

AMP stands for Accelerated Massive 
Parallelism, and it lets you perform 
lightning-fast calculations for workloads 
that fit the proper profile that allows 
the Graphics Processing Unit (GPU) to 
be efficiently loaded up with waves of 
the calculations. 

The GPU in modern computers 
has enabled fantastically realistic 
experiences. This is because physics 
calculations, lighting effects and ren¬ 
dering tasks fit well into the kinds of 
tasks that the GPU does well. Another 
area that has successfully used the 
GPU is cryptography. For example, if 
you want to crack encryption, you 
have to run all potential possibilities 
through the same calculation. Ulti¬ 
mately, if you have many repetitive 
calculations that can be loaded and 
unloaded en masse, then the really 
high-performance improvements 
described are attainable if you are 
willing to use C+ + . 

Aside from the performance boosts, 
continued on page 40 ► 
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the biggest updates for C++ developers 
are the features in the Visual Studio 
IDE that have made C# and VB.NET 
development so easy over the last 
decade. C + + projects can now be 
jump-started thanks to project and item 
templates. Once the project is created, 
it can load asynchronously for faster 
startup. IntelliSense has become a 
must-have feature, with Microsoft 
bringing this time-saver to virtually 
every editing interface. That includes 
C++, which compliments the list func¬ 
tionality for objects so that developers 
can easily choose relevant members 
from objects. 



each joined the 
ranks of the pieces 
that could be 
brought into a proj¬ 
ect to form the 
solution. This capa¬ 
bility, though, also 
brought a drawback 
the form of the extra work 
required to get the proper libraries 
installed or in place so that they can be 
leveraged. 

This is a world familiar to Java pro¬ 
grammers and one that Microsoft 
developers do not enjoy navigating. To 
help solve this problem, NuGet was 
introduced in 2010 to provide package 
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skip this step, 
since NuGet is already 
integrated. I normally 
see this integration in the 
context menu presented 
when right-clicking on the 
project node in Solution Explorer and 
choosing “Manage NuGet Packages...” 
(see Figure 1). 

Beyond package management, 
there has also been the shift toward 
design patterns. Years ago, developers 
built client-server applications that 
evolved into three- or even n-tier 
applications. These days, the concept 
of design patterns allows developers 
to find the right size for the architec¬ 
ture of their solutions without chang¬ 
ing the way they do all their develop¬ 
ment. The evolution of MVC is a 
prime example of this transition to 
using patterns to emulate and imple¬ 
ment well-understood solutions to 
complex problems. 

The Microsoft Patterns and Prac¬ 
tices group has regularly released tools 
and frameworks that have implemented 
useful design patterns so that develop¬ 
ers do not have to convert everything 
they do while still benefiting from the 
right solution for complex problems. 
This will likely be even more critical in 
the future as we try to leverage the raw 
power that even basic systems provide 
due to their multi-processor and multi¬ 
core nature. 


Figure 1. NuGet allows third-party libraries to be registered for easy deployment. 


There are other features as well, 
from Architecture Explorer to unit¬ 
testing with C++, and it all adds up to 
C++ developers really being caught up 
in a single version. Gregory summed up 
our conversation by saying, “A native 
developer who stays on older versions 
of Visual Studio is missing all of these 
great capabilities.” 

From NuGet to design patterns 

Developers who have been using Visu¬ 
al Studio over the years are accus¬ 
tomed to building applications from 
component parts like a Lego sculpture. 
Controls, libraries and frameworks 


management for .NET libraries. 
Approximately a year earlier, Microsoft 
announced the Web Platform Installer 
to solve this same kind of problem for 
Web components, including Internet 
Information Server. 

NuGet is a more broadly targeted 
package manager developed by the 
Web Platform team at Microsoft that 
allows third-party libraries to be regis¬ 
tered for easy deployment, and it has 
been quickly adopted by a wide audi¬ 
ence. For prior versions of Visual Stu¬ 
dio, NuGet is available for installation at 
visualstudiogallery.msdn.microsoft.com. 
With Visual Studio 2012, you can even 


Team Foundation Server 

No conversation about productivity 
gains with Visual Studio 2012 would be 
complete without talking about Team 
Foundation Server (TFS). For many 
development teams, the capabilities in 
TFS have motivated them to keep 
pace with new versions of Visual 
Studio, especially when managing 
larger projects with teams of technical 
resources. For additional insight, 
I turned to Richard Hundhausen, who 
is a Microsoft Visual Studio ALM 
MVP, a Professional Scrum Trainer, 
and the author of “Professional Scrum 
Development with Microsoft Visual 
Studio 2012.” 

I asked him where TFS stands rela- 

continued on page 43 ► 
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tive to other developer collaboration 
tools in the space. He said, “The ALM 
tools in Visual Studio 2010 were 
enough to move Microsoft to the 
upper-right leaders quadrant in Gart¬ 
ner’s 2012 ALM Magic Quadrant 
report. Visual Studio 2012 adds to this 
capability in many ways, especially for 
Scrum teams. We use and recommend 
Visual Studio 2012 for Scrum teams 
doing any kind of .NET development, 
or using tools that can integrate with 
Team Foundation Server 2012, such as 
Eclipse.” 

TFS is a powerful set of tools that 
has replaced Visual SourceSafe as the 
Microsoft source-control offering. 
Leveraging TFS as your organizations 
source control is a good way to get 
your feet wet with it, but beware ignor¬ 
ing the rest of its capabilities. Hund- 
hausen pointed out that he sees too 
many teams doing Scrum and using 
Visual Studio 2012, but not together. 
“They may use TFS for source control 
and automated builds, but are still 
using whiteboards or sticky notes to 
plan and track their work,” he said. 
“While this is perfectly fine, they are 
missing the boat when it comes to 
traceability, auditing, and reporting 
requirements.” 

Visual Studio 2010 started the 
process of Web-enabling the team 
development functions that Visual Stu¬ 
dio 2012 has taken much further. Fig¬ 
ure 2 shows the Team Web Access 
interface that is part of TFS 2012. New 
to this version are greatly improved 
interfaces for the project-manager 
role, including replacing the 
spreadsheet interface from 2010 for 
managing product backlog and iteration 
backlog by project. Hund- 
hausen finds the Web- 
based agile-project- 
management tool 
improvements to be the 
handiest aspect to Visual Stu¬ 
dio 2012 because “They allow 
a Scrum Team to visually man¬ 
age their Product Backlog, 

Sprint Backlog, and 
tasks using drag- 
and-drop.” 


Though Hundhausen admitted that 
it took a while to get used to the IDE, 
the functionality of Visual Studio is 
greatly improved. “Everything just 
feels faster,” he said. “The new testing 
experience is slick and can run tests 
continuously. It can even run tests 
from your favorite framework (NUnit, 
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Improved project-manager interfaces were 
the main update for Team Web Access. 

xUnit, MbUnit, etc.).” 

John Alexander, managing partner 
of AJI Software and co-host of “The 
AJI Report” podcast, is also a propo¬ 
nent of the advantages to develop¬ 
ment teams to be gained by leveraging 
TFS. He stated, “The productivity 
gains from TFS have long allowed our 
teams to be faster at generating quali¬ 
ty code. Team Foundation Server 
2012 adds to this geometrically with 
the multitude of enhancements that 
enable us to communicate status and 
progress to both our clients and within 
development teams faster than ever 
before.” 

Hundhausen summed it up by 
saying, “If the team is already 
using TFS, an upgrade to 2012 is 
a no-brainer, especially for Scrum 
teams. The Team Explorer win¬ 
dow is more integrated and 
cares about the context of your 
work, allowing a busy, multi¬ 
tasking developer to sus¬ 
pend work, put out a fire, 
and resume right where 
he or she left off.” 

Windows 8 

Windows 8 introduces its 
own type of application supported by 


/vv vxv 


the new WinRT platform, which 
strangely enough is hard to refer to 
since the official marketing name has 
become unavailable due to it being 
trademarked by a company in Ger¬ 
many. (For that reason we do not call 
them Metro-style applications any¬ 
more.) 

Many now call them WinRT or Win 
8-style applications rather than use 
the term Modern UI that Microsoft 
seems to be pushing of late. Some 
Microsoft employees have even 
referred to them as Windows Store 
apps, but that seems to add more con¬ 
fusion since technically .NET applica¬ 
tions can also be listed in the Windows 
Store (though some will only be listed 
as links to download websites and not 
directly installable like WinRT apps). 
There are .NET for Windows Store 
apps that are required to use a limited 
set of classes to qualify for installation 
directly from the Windows Store. 
There are big parts of the .NET 
Framework unavailable in this case, as 
even data access via ADO.NET is not 
available if you choose this path. 

Even if you do not plan to build 
WinRT apps, this is a topic that bears 
paying attention to because a great deal 
of the innovation described here is the 
result of Microsoft betting big that 
WinRT applications will help them dis¬ 
rupt and capture the tablet and app 
market. Alexander said, “Several of our 
clients are interested in building apps 
for WinRT, so Visual Studio 2012 is 
required equipment.” 

This move is also pushing the 
emphasis on asynchronicity in the 
.NET Framework 4.5, because one 
imperative to Microsoft’s plan is 
responsive user interfaces, and that 
requires anything taking too long to be 
able to be run as a callback. This helps 
keep the interface responsive. For now, 
WinRT applications are the only 
option, aside from browser interfaces 
with the editions of Windows 8 that run 
on the ARM processor. But they are 
also limited to only running on Win¬ 
dows 8, meaning that they represent yet 
another option for developers, but not a 
unifying option. 

continued on page 44 ► 
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As we will see in the next section, 
there are new features in version .NET 
Framework 4.5 that help ameliorate 
this aspect. According to Microsoft, 
sales of Windows 8 in the first month 
are promising, and the Surface system 
has piqued interest in some circles, but 
it is hard to really know how this bet by 
Microsoft will play out in the market. 

New .NET features 

Each new version of Visual Studio has 
introduced a new version of the .NET 
Framework starting in 2001. The differ¬ 
ence in the case of Visual Studio 2012 is 
that the .NET Framework version it 
provides (4.5) is not side-by-side com¬ 
patible with the version that came 
before. Technically, .NET 4.5 is an 
upgrade to .NET 4.0, so while they are 
functionally consistent for the vast 
majority of situations, there are some 
things that are not the same at the base 
level. 

Portable Class Libraries allow code 
to be shared across platform implemen¬ 
tations. For example, if you want to 
build an application and provide inter¬ 
faces to both Windows Phone and Win- 
RT, the Portable Class limits the types 
to those that are common across the 
targets. This is limiting on one hand, 
but protects you at the outset from 
expecting something that will not be 
available. 

Parallel programming has taken 
another step forward with .NET 4.5. 
First introduced with .NET 4.0, the 
Task Parallel Library (TPL) recog¬ 
nizes that systems today are multi¬ 
processor and multicore, but it is 
intensely difficult for a developer to 
take advantage of this without becom¬ 
ing an expert on threading. The 
improvements in parallel processing 
are the source of the Async capabili¬ 
ties that are a major theme for WinRT 
applications’ ability to ensure that the 
UI is always responsive. 

Windows Presentation Foundation 
(WPF) got some love in the form of a 
new Ribbon control, the ability to bind 
data to static properties, and data-bind- 
ing custom types that implement the 
ICustomTypeProvider interface. There 


are other changes that are clearly just 
fixing past oversights, like the ability to 
check if the data context of an item con¬ 
tainer is disconnected. 

Windows Communication Founda¬ 
tion (WCF) also got contract-first sup¬ 
port (allowing developers to define 
how systems will be called, then gener¬ 
ating content from that), simplified 
configuration files, and WebSocket 
support for bidirectional Web commu¬ 
nications over HTTP and HTTPS. 
Overall, WCF got about twice the 
attention that WPF did in this latest 
Framework version. 

Core .NET Framework classes got 
some minor tweaks as well. They vary 
between incremental updates to con¬ 
trollability, such as being able to set the 


default culture for an application 
domain, to deep system-performance 
improvements such as better back¬ 
ground garbage collection for servers. 
In this regard, .NET 4.5 does feel like a 
service pack for .NET 4.0, and given the 
version number and fact that they are 
not side by side, that seems about right. 

Installer dilemma 

Developers who provide installation 
packages to their users have historically 
had the ability to choose between using 
one of the various commercial installer 
packages—such as those available from 
InstallShield and Nullsoft—versus tak¬ 
ing advantage of the setup projects 
available in prior versions of Visual Stu¬ 
dio. Things have changed now that 
Visual Studio 2012 no longer provides 
the ability to choose this latter option, 
because the setup project templates 
have been removed. 

There is another alternative for 


those of us who have depended on set¬ 
up projects: WiX, which stands for Win¬ 
dows Installer XML. WiX is a Source- 
Forge project that lets you create your 
own Windows Installer package. For 
the latest version of this open-source 
tool, check out wixtoolset.org. WiX 
looks more attractive once you are 
aware of the fact that the field of com¬ 
mercial installer packages has narrowed 
since Symantec withdrew Wise from 
the market in 2010. 

The good, the bad and the ugly 

There is a great deal of good stuff for 
developers that should drive them to 
use this latest version of Visual Studio 
and the version of the .NET Framework 
that comes with it. As we have seen, the 


story is not all roses due to the need to 
adjust to the loss of the setup project, 
though that is not enough to dissuade 
most from moving up to this latest edi¬ 
tion of Visual Studio. More than a few 
people have expressed distaste over the 
new interface in terms of the colors and 
the use of all caps in the menus. 

Colorado Technology Consultants’ 
Yack said, “The ALL CAPITAL letters 
in the menu and the lack of most of the 
color on the icons takes some getting 
used to, but the ability to simply search 
for a command you are looking for 
using Quick Launch makes it easier to 
find things that have moved around.” 

There will be foibles in any new ver¬ 
sion of a popular software product, but 
there just is not an option to sit on the 
sidelines in this case. Visual Studio 
2012 brings too much to the table for 
developers and development teams to 
ignore it. I 

□ Find this story at http://sdt.bz/37266 




'Several of our clients are 
interested in building apps for 
WinRT, so Visual Studio 2012 
is required equipment.' 


—John Alexander, AJI Software 
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Requirements management: 

Changing paradigms 

Applications for cloud, mobile and social 
have much in common but also differ 



BY SUZANNE KATTAU 


I n this new paradigm of cloud com¬ 
puting, mobile software develop¬ 
ment and agile distributed teams, 
its more challenging than ever before 
for software development teams to 
manage requirements. These days, 
organizations maintain Web apps, desk¬ 
top apps and mobile apps. Business log¬ 
ic often overlaps in each. But due to 
their specific nature, each type of app 
can come with its own set of require¬ 
ments. Also, each can present develop¬ 
ment teams with certain pressures 
regarding how to collect and maintain 
requirements. 

Sometimes these are the same pres¬ 
sures, but they can be different depend¬ 
ing on the type of app. “Business rules— 
the statements or facts by which 
businesses run—are universally the same 
for a business regardless of the platform 
on which the application is being devel¬ 
oped,” said Ashu Potnis, vice president of 
product management and technology at 
requirements definition and manage¬ 
ment tool provider Techno Solutions. 


cc What differs is the user interface and 
system behavior. However, copying 
requirements and business rules into 
multiple projects is a not a good idea 
because now you have the same business 
rules in multiple places. They quickly go 
out of sync with each other.” 

According to experts, the pressure 
that these individual application teams 
are experiencing is really around visibil¬ 
ity. “There is an overlap that occurs,” 
said Derwyn Harris, cofounder and 
solutions architect at Jama Software, 
which specializes in requirements man¬ 
agement. 

“The fact that I’m building for mobile 
or for the Web or for the desktop doesn’t 
really change my process. It’s still going 
to require the standard process that we 
go through to define requirements and 
build out stories if we’re agile. The 
approach is fairly universal. There isn’t 
necessarily a different pressure on these 
different types of applications.” 

The pressure of collecting and main¬ 
taining requirements really comes, 


Harris said, when an organization is try¬ 
ing to maintain these different applica¬ 
tions in parallel. “So, at that point in 
time, the pressure is really more about 
how do we stay in alignment to ensure 
that we’re taking advantage of function¬ 
ality that all of these applications uti¬ 
lize,” he said. “So it really comes down 
to visibility—ensuring that all the dif¬ 
ferent teams have visibility into what 
the other teams are working on.” 

Harris said that development team 
members can use a requirements man¬ 
agement tool such as Jama Contour’s 
Review Center to send a set of require¬ 
ments to both the team and stakehold¬ 
ers to review and provide feedback so 
that everyone stays on the same page. 

For all types of apps, missed and 
misunderstood requirements are often 
two of the major issues when it comes 
to product and/or project require¬ 
ments. “These two problems tend to 
get exacerbated with distributed 
teams,” Potnis said. “Also, with the fast 
continued on page 48 ► 
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iterative process of agile teams, 
requirements become front and center. 
In fact, each agile iteration is started by 
first selecting the user stories or use- 
case scenarios to be accomplished in 
that iteration. And user stories and use- 
case scenarios are, essentially, require¬ 
ments.” 

See and act on the requirements 

In general, the solution to the require¬ 
ments issue, according to Potnis, is to 
communicate them using visual tech¬ 
niques. Use less text and more diagrams, 
screen mockups and application simula¬ 
tion to get your point across. “Our 
TopTeam Analyst, for example, helps 
business analysts and requirements 
engineers by automatically converting 
textual requirements into activity dia¬ 
grams and flow charts, as well as provid¬ 
ing a full complement of visual tools,” he 
said. Some of TopTeam Analyst’s visual 
tools include screen mockups, applica¬ 
tion walkthroughs and simulations, and 
business-process modeling. 

With TopTeam Analyst, Potnis said 
users are able to create project branch¬ 
es similar to the branching process that 
is common to source-code control sys¬ 
tems. Along with branching, he said 
users can also share the same require¬ 


ments and business rules into different 
projects and thus avoid duplication. 

Whether developers are creating 
Web apps, desktop apps or mobile 
apps, speed and time-to-market are the 
biggest pressures they face when trying 
to collect and maintain requirements. 
“The biggest challenge right now that’s 
facing companies—big, small, public, 
private, government, ourselves, our 
peers, all companies in every country 
that we work in—the No. 1 issue facing 
them today is time to market,” said 


Pete DuPre, chief solutions architect at 
development toolmaker Borland, a 
Micro Focus company. “When you 
break that down into the different 
types of apps that they’re trying to 
commercialize, the need for speed in 
the tools and the processes that they’re 
using is the number one challenge fac¬ 
ing teams today.” 

DuPre said that the dynamics of 
mobile, social and cloud are driving the 
need for organizations to change and 
react more rapidly to incoming changes 
from the business. “For the require¬ 
ments definition and management 
troops in every one of these organiza¬ 
tions, the challenge is how are they 
going to collect daily change requests, 
respond to them, and manage all of 
those requirements,” he said. 

What the mobile, social and what 
DuPre calls the “post-PC era” has 
caused is the need for organizations to 
put the requirements elicitation, defini¬ 
tion, and management processes and 
technologies under the microscope, he 
said. “Organizations need to ask them¬ 
selves if they have the processes, tools, 
partnerships and the expertise that they 
need for their organization to be able to 
react to changing requirements and 
changing market dynamics as fast as the 
consumers who are typically using 
these Web and mobile apps,” he said. 

Companies can use a requirements- 
management tool such as Borland’s 



Borland's Caliber can quickly manage requirements as they change. 
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Seapine's Testrack RM also facilitates the sharing of requirements among teams. 


Caliber to make sure they can react 
quickly to satisfy their business users 
as fast as those requirement changes 
come in. 

When it comes to Web, desktop and 
mobile app requirements management, 
the “obvious consequence of maintain¬ 
ing different variants in requirements, 
specifications and codebases is an 
increased development complexity and 
cost,” said Stefano Rizzo, VP of strategy 
and business development at Polarion 
(an ALM software provider). “But, 
besides increasing complexity of 
requirements and specifications, mobile 
apps imply an even shorter require¬ 
ments and specification life cycle. The 
time that a requirement lives before 
being involved in a change process is 
shorter than ever.” 


The challenge that software teams 
need to face, Rizzo said, is in maintain¬ 
ing reliable traceability throughout the 
application life cycle—from require¬ 
ment to specification to test to code— 
in order to quickly analyze the impact 
of a change and be instantly ready in 
make the change happen. 

Another big challenge to collecting 
and maintaining Web, desktop and 
mobile app requirements stems from 
the fact that apps tend to have different 
development schedules and timelines. 
“Even though they have these often 
overlapping requirements, they’re 
often working on their own heartbeat 
and their own really different sched¬ 
ule,” said Jeff Amfahr, director of prod¬ 
uct management at Seapine. “So keep¬ 
ing those in sync can be pretty tough. 


Having some ability to manage require¬ 
ments that you might have implement¬ 
ed on some platforms and not on oth¬ 
ers—which will affect all of them—is a 
big challenge that we see often.” 

Amfahr said that a lot of times, dif¬ 
ferent types of apps have development 
teams who are working off really differ¬ 
ent methodologies and approaches. 
Supporting those different approaches 
and styles is another big challenge he 
sees organizations facing. 

“Where sometimes the desktop 
application team is following maybe a 
waterfall or a modified iterative 
process—and they’re pretty happy with 
that and they’re cranking along on 
that—you’ve then got the mobile teams 
using some Scrum approach,” he said. 
“So they want user stories and the Web 
team wants more flushed-out require¬ 
ments. So, again, although they may be 
the same requirements, the way they 
want to see them and the way they want 
to interact with them can be pretty dif¬ 
ferent.” 

A good requirements-management 
tool should let teams link a lot of differ¬ 
ent things together that are related but 
may not be exactly the same thing. It 

continued on page 50 ► 


1 With the fast iterative process of 
agile teams, requirements become 
front and center.' 
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should let teams automatically flag 
related requirements when they need 
to be updated. Amfahr said that Seap- 
ine’s TestTrack RM tool gives all the 
teams the ability to share requirements 
with one another, as well as provide 
support for linking things. 

“Again, it might be the same require¬ 
ment, but the desktop team needs a lot 
more detail about how its going to be 
working, whereas the mobile team 
wants that broken into six very small 
user stories because they’re going to be 
releasing this over three separate releas¬ 
es,” Amfahr said. “Although it’s the 
same big requirement, you’ve got these 
six small ones the mobile team’s work¬ 
ing on. So the tool has really good sup¬ 
port for linking those things so people 
can know this is all related to each oth¬ 
er, so, as things change, all those things 
can be updated.” 

One of the biggest challenges of 
managing requirements for mobile app 
development is the context in which 
those apps are used. “You’re not devel¬ 
oping or designing an app for a user 


who is sitting at a desk,” said Peter 
Indelicato, senior product manager 
iRise (a visualization software compa¬ 
ny). “You could be defining and devel¬ 
oping an app for a doctor who’s running 
through a crowded hallway in a hospi¬ 
tal, for example, trying to type some¬ 
thing onto an iPhone or an iPad while 
they’re being bumped, shoved and 
yelled at by patients and other doctors.” 

The context for the usage presents 
interesting challenges when it comes to 
managing requirements, Indelicato 
said, because those requirements that 
come out of that contextual usage are 
tougher to find than your typical 
requirements. This is because you’re 


not talking about a situation that most 
developers are familiar with, which is 
somebody sitting at a desk. 

“For a doctor running through a hall¬ 
way, there might be some very impor¬ 
tant requirements about the size of the 
controls on the iPhone app or the text 
size, for example, because he’s reading it 
while he’s running,” he said. “Those 
types of requirements rarely come up 
when you’re reading a textual use case 
in a meeting room or if you’re looking at 
some static mockups up on a projector. 
Those types of requirements don’t come 
up because you’re out of context.” 

What’s critical and challenging for 
business analysts, designers and prod- 
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Polarion Requirements integrates change requests into a developer's backlog. 


uct owners to be able to do, Indelicato 
said, is to run the simulation on the 
device itself, meaning on the actual 
iPhone so that the end user (the doctor 
in this case) can get the most realistic 
experience possible. “And by realistic, I 
don’t just mean the app that’s on the 
phone. I mean actually using it in the 
environment that the end application is 
intended to be used in,” he said. 

“One of the key ways to satisfying 
requirements, in the case of developing 
mobile apps, is to take the simulation, 
run it actually on the device, and then 
give that device to one of your intended 
users to see them use it in the context of 
that mobile environment. And of 
course they give you feedback, you 
observe what works and what doesn’t, 
and then you iterate.” 

Indelicato said that one of the com¬ 
ponents within the iRise Enterprise 
Platform is a tool called iRise Mobile. 
The tool lets you run your simulations 
on the device itself, for example, on an 
iPhone or an iPad. 

Requirements in a cloud world 

Cloud development and agile develop¬ 
ment allow for much faster updates to 
software. No longer do teams need 18- 
month development cycles to collect 
requirements and build out apps. Feed¬ 
back to apps is also received much faster 
now. These new paradigms affect how 
often requirements are collected and 
how they are addressed in various ways. 


Mastering short iterations and being 
able to successfully collect customer 
feedback are two key success factors for 
development teams, according to Polar- 
ion’s Rizzo. “Cloud-based apps and 
mobile apps reach an increasingly larg¬ 
er number of users,” he said. “Reside 
providing tools to their users that allow 
an easier communication between 
users and developers, development 
teams are now starting to worry about 
the ‘Big Feedback’ problem.” 

Customers and their feedback must 
be part of the development environ¬ 
ment, Rizzo said. Customer feedback 
must be collected and analyzed in the 
same platform where requirements and 
development artifacts reside. “Modern 
ALM platforms such as Polarion 
Requirements allow feedback and 
change-request population by users 


directly into developers’ backlogs,” he 
said. “This is where they can be ana¬ 
lyzed, evaluated, selected and imple¬ 
mented.” 

The quicker development cycles of 
cloud and agile development now 
require faster review and approval 
cycles, TechnoSolutions’ Potnis said. By 
using TopTeam Analyst, he said users 
are able to create review packages and 
send small sets of requirements for 
review via TopTeam s Online Review 
and Approvals system. “Users can also 
request digital signatures via TopTeam 
Analyst,” he said. “This avoids having to 
author large, monolithic business 
requirements documents or product 
requirements documents.” 

With cloud and agile development, 
the requirements collection phase 
often overlaps with ongoing develop¬ 
ment, which is a big change from the 
traditional way when developers had a 
lot of time to get the requirements, col¬ 
lect feedback and validate those 
requirements via storyboarding and 
other approaches. “I think the big 
change is that you now need to gather 
that feedback based on sometimes the 
actual software or small pieces of the 
software, small parts of the require¬ 
ment that somebody has,” said Seap- 
ine’s Amfahr. “Making the right parts of 
the requirements visible to the cus¬ 
tomer, the end user, or whoever is in 
the stakeholder role there, is pretty crit¬ 
ical. Again, in the past, you sort of got 
them all, everyone agreed to it, and 
then you went off and worked on it. 

continued on page 52 ► 
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Now you’re exposing more of that in 
the middle.” 

Get feedback from everywhere 

It’s also important that business stake¬ 
holders are brought into the process 
smoothly so that the quick, iterative 
style of development seen in cloud and 
agile development can be maintained. 
“This is critical. Stakeholders should 
and must be involved,” said Jama’s Har¬ 
ris. “It’s ironic that this is a problem 
because, really, the Agile Manifesto was 
not so much saying we need Scrum or 
we need Extreme Programming or we 
need a new methodology. It was saying 
that we need a new mindset around 
how we interact with the stakeholders 
and how we get the software out there.” 

Stakeholders can be given access to 
a development team’s requirements- 
management tool such as Jama’s Con¬ 
tour. “There’s flexibility built into the 
solution. Certainly, in the best-case sce¬ 
nario, you want your stakeholders to be 
able to simply log in, and they can do 
that in Contour,” Harris said. He added 
that stakeholders can see a project, as 
well as what stories have been built and 
what requirements those are tied to. It 
all depends on how much visibility the 
development team wants to give them. 

Making stakeholders aware of the 
process and giving them a chance to see 
it in order to help re-prioritize things as 
you’re going along is the value of the 
shorter cycles in cloud and agile develop¬ 
ment. But, according to Amfahr, it’s not 
just about letting stakeholders see the 
requirements, it’s about letting them see 
everything about the requirement, which 
includes the cost of changes. This lets 
them help the development teams to 
make better decisions around what to do. 

“Visibility of the requirements 
throughout the application life cycle is 
critical,” Amfahr said. “It’s about letting 
[stakeholders] see things like, how big 
are these requirements? What’s the cost 
of these requirements? What’s the cost 
of the other things that go with it so 
they can have a better conversation 
around them?” 

Customers often now use Facebook, 
Twitter or other social-media sites to 


mention problems they have encoun¬ 
tered with apps. Because of this, it is 
vital for organizations to monitor social 
media to make sure that any problems 
about a user’s experience with an app 
aren’t missed. “This is absolutely and 
unconditionally important. An organi¬ 
zation that doesn’t have active and 
proactive monitoring of the social net¬ 
work has its head in the sand,” said Bor¬ 
land’s DuPre. “If you’re not doing it, 
then get out of the way because your 
competitors are going to go right 
through you. It is a head-in-the-sand, 
novice mentality not to do so.” 

As with any discipline, social reputa¬ 
tion is important in application devel¬ 
opment, Rizzo said, and the practices 
about how to filter and evaluate social 


requirements should be able to easily 
make it back into the development 
process. With TopTeam Analyst, Potnis 
said you can track these feature 
requests because it is fully configurable. 
“These social comments can then be 
traced into your product or application 
requirements for allocation into the 
next development iteration,” he said. 
“This way, you have full traceability 
from request to test case, and nothing 
falls through the cracks.” 

A stakeholder, whether he or she is 
in project management, a business ana¬ 
lyst or represents the voice of the cus¬ 
tomer internally, should be the one 
monitoring social-media feedback, 
according to Amfahr. But his advice to 
development teams is to be careful 



‘An organization that doesn't have 
active and proactive monitoring of 
the social network has its head in 
the sand . 1 




Pete DuPre, Borland 


feedback are the same as they are in 
other fields of human activities. “But 
the key difference is that, once filtered 
and selected feedback is collected, we 
have to put it properly into the develop¬ 
ment life cycle,” Rizzo said. “As social 
media lives on mobile, cloud and Web 
platforms, it will be very important for 
developers to use requirements-man- 
agement platforms open to mobile, 
Web and cloud.” One of these modern 
platforms, he said, is Polarion Require¬ 
ments. 

It’s also important that organizations 
assign the appropriate person or people 
to monitor social-media feedback on 
your apps. “In today’s connected social 
world, it is important to stay on top of 
the chatter and get feedback from wher¬ 
ever you can,” Potnis said. “And obvious¬ 
ly someone needs to monitor these sites; 
it should be the product manager or 
someone from his or her team.” 

If necessary, these problems need to 
be quickly corrected, and the resulting 


about listening to social-media feed¬ 
back because it’s really easy to get 
sucked into listening to niche cus¬ 
tomers and niche users. You can spend 
all your time chasing individual cus¬ 
tomer needs and not your most impor¬ 
tant customers’ needs, he said. “Use a 
good tool that lets you not only get that 
social-media feedback, but see how fre¬ 
quently you are hearing it and from 
what kinds of customers. 

“It’s not just about making sure 
you’re responding to social-media feed¬ 
back, but making sure you put it into a 
bigger context,” Amfahr continued. 
“Feedback needs context: Who’s doing 
it? How often are you hearing it? From 
what kinds of users are you hearing it? 
Being able to pull all that feedback in 
and pull it together is important. I think 
that’s a big difference between using a 
tool like TestTrack RM rather than 
Microsoft Word; you can put that con¬ 
text around it.” I 

□ Find this story at http://sdt.bz/37263 
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ouve undoubtedly heard of Apache Hadoop, 
a framework for distributed processing, and 
the Map/Reduce pattern it has helped make 
famous. But “Map and Reduce” is a general pat¬ 
tern, not a framework-specific technology. “Map” 
means “Do some data processing to every element 
in your collection,” “Reduce” means “Walk over 
your collection of data (such as that produced by 
the ‘Map’ step) and summarize or coalesce the 
results.” For instance, “Map” all the photos you 
took on your vacation by taking a quick look at 
them and marking them as blurry or sharp. 
“Reduce” them by deleting the blurry ones. 

Two things are important: first, make the Map 
data processing as discrete and rapid as possible. In 
the case of triaging photos, don’t look at the first of 
2,000 photos and decide whether to put it in the 
photo album you share with your friends, just 
decide whether it’s blurry or not. 
The second important rule is to 
keep the “Reduce” step separate 
from the Map step. Maybe it 
turns out that the only photo you 
have of Bigfoot is a little blurry; 
value decisions are often hard to 
make without the context of the 
entire calculation. 

This implies another non-obvious aspect of the 
Map/Reduce approach: Map/Reduce is really Map 
and Reduce and then Map some more and Reduce 
some more and then save that and return later to 
Map a little more, etc. With digital photography, 
success comes from an efficient and consistent 
manner to rate and tag your media, and then work¬ 
ing with those Map/Reduced datasets for different 
projects (a “Highlights of Our Trip” album versus a 
“A Glimpse of Bigfoot” album). 

The same principle holds true with Big Data: 
Even if you have a hunch about the ultimate 
answer you’re trying to derive, it is more likely to 
emerge from incremental steps. Although ulti¬ 
mately you may rerun your entire calculation from 
scratch, it’s something to avoid during the develop¬ 
ment stage: re-processing raw data over and over 
again rather than an already-Map/Reduced dataset 
is infuriating and wasteful. (On the other hand, you 
must bear in mind the reductions you’ve already 
applied and avoid re-deriving something you’ve 


already discarded.) 

I haven’t mentioned distributed processing yet, 
but the reason why Map/Reduce has become so pop¬ 
ular is that the Map step, if done properly, is highly 
parallelizable. The Map function applies the same 
function to every element in a collection; it’s some¬ 
times called the “Apply” function, and LINQ calls it 
the “Select” function. A properly written Map func¬ 
tion ought not have any loop-carried dependencies: 
There ought to be nothing in it that requires infor¬ 
mation from other collection elements or that is 
dependent upon the order in which it is calculated. 

Achieving such independence may require pre¬ 
processing with other Map/Reduce sequences, 
but the benefit gained is that a framework such as 
Apache Hadoop can distribute the Map calcula¬ 
tion across multiple cores, chips or machines (this, 
of course, becomes very framework- and problem- 
specific). 

The Reduce step (called in other places 
“Inject,” “Fold” or “Aggregate”) is not independ¬ 
ent: Once all the Map functions have been calcu¬ 
lated, it moves through the data and does such 
things as remove duplicates, coalesce data, or col¬ 
lect statistics (such as when you gather the number 
and ratings of photos tagged “Bigfoot”). 

Map, Reduce and the concept of using them in 
sequence are straightforward. Hadoop is impres¬ 
sively straightforward to get up and running, too. 
But in between “Hello World” and winning a Kag- 
gle contest are quite a few steps. 

The book “MapReduce Design Patterns” by 
Donald Miner and Adam Shook is a good interme¬ 
diate resource and among my very technical books 
of the year. It does not have the step-by-step 
instructions of a “recipe” book, but I think that’s a 
fine decision given Hadoop’s position as a fairly spe¬ 
cialized technology. By avoiding line-by-line break¬ 
downs, the book is able to deliver a lot of content in 
its 436 pages. It starts with an approachable summa¬ 
ry in 30 or so pages, and then covers “Summariza¬ 
tion Patterns,” which I think is the logical training 
field for Map/Reduce. I’m no expert in Hadoop, but 
it seems to me the extensive discussion of “Join Pat¬ 
terns” was as comprehensive as it was enlightening. 

I highly recommend the book to anyone with 
even a passing interest in Big Data and distributed 
processing. I 
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AnDevCon 

The Android Developer Conference 

May 28-31,2013 

The Westin Boston Waterfront 

Get the best real-world 
Android developer training anywhere! 




• Choose from more than 75 classes and workshops 
Network with speakers and other Android developers 

• Check out more than 40 exhibiting companies 

“It’s a great place to come and learn from those who have 
been through the experience of both development and 
marketing of Android apps.” 

—Dan Heath, Principal Systems Developer, SAS 

“AnDevCon is one of the best networking and 
information hubs available to Android developers.” 

—Nate Vogt, Android Developer, Willow Tree Apps 


“AnDevCon has a good mix of classes and interesting 
speakers. You will find something to learn.” 

—Joe Mitchell, Software Engineer, Quicken Loans 

“If you want to get info on Android development, 
then this is the place to come to.” 

—Ats Jenk, Senior Software Engineer, Microsoft Skype Division 


AnDevCon™ is a trademark of BZ Media LLC. Android™ is a trademark of Google Inc. Google’s Android Robot is used under terms of the Creative Commons 3.0 Attribution License. 
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Graph databases are the next big thing * 
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A s the world becomes more connected, so 
does its data. The data-management tools 
and techniques of the past are not equipped to 
handle this non-uniform, semi-structured and 
highly interconnected data. Faced with this dilem¬ 
ma, software professionals have the option of 
struggling to fit everything into relational or key- 
value models, or instead to embrace the graph data 
model and steer their teams toward success. 

Its not always an easy decision to throw away 
decades of experience and replace relational data¬ 
bases altogether, but it is straightforward to move 
the highly connected parts of the system to a graph 
database. The size and complexity of the data is 
inevitably fueling a movement toward polyglot per¬ 
sistence (data housed in the stores best equipped 
to handle it) instead of a single (relational) data¬ 
base. While polyglot requires a broader under¬ 
standing of data issues, it yields tremendous bene¬ 
fits for data architecture and governance. 

To put the drivers for graph database adoption 
into context, we see that valuable data is generated 
by the spread of social networks in the Web. The 
social graph, which entangles the intent, interest and 
consumption graphs, is driving online commerce and 
advertisement. Furthermore, the latest advance¬ 
ments in smartphones are growing the mobile graph, 
tying the online world back to the real. The data gen¬ 
erated is already in graph form as people friend each 
other and buy things. In other words, they are creat¬ 
ing relationships, the very thing graph databases 
excel at. 

Graph databases have a simple structure, using 
just two types of objects: nodes and relationships. 
Users, things and places are modeled as nodes, and 
various kinds of labeled, directed relationships are 
created between them. These simple structures form 
paths that can then be easily used to compute which 
of your friends should also be friends, or what new 
music your friends are buying. The data is connected 
in simple ways, but traversals of these connections 
can answer very sophisticated questions. 

We can query highly connected data using tai¬ 
lored graph query languages that are designed to 
express intent over graph structures. A graph query 
language is much simpler than trying to describe 
such connections with hundreds of lines of SQL 
code. Finding how things are connected reaches 


out beyond basic social graphing and into areas like 
matching people to jobs based on skills and experi¬ 
ence, patients to diseases based on symptoms, sus¬ 
pects to crimes, sellers to prospects, borrowers to 
lenders, etc. 

Another benefit is speed of development. The 
schemaless nature of graph databases enables devel¬ 
opers to rapidly evolve a system as requirements 
change to align with business needs. Adding a new 
type of node or relationship is typically painless, and 
its a more natural modeling approach than creating 
a bunch of join tables and the logic to handle them. 

In terms of modeling approach, graph databases 
are whiteboard-friendly, since business experts can 
communicate with developers by simply drawing 
nodes and relationships, which can then be mod¬ 
eled directly in the database, yielding no semantic 
gap between the users data model and how it is 
expressed in the database. Con¬ 
versely, with a relational or key- 
value store, the data model does 
not resemble the domain. 

While some might initially be 
apprehensive that graph thinking 
is a new technique to learn, in fact 
its a very natural technique that 
we simply need to rediscover. But once exposed to 
graph databases, its common to start seeing graphs 
everywhere, a testament to how natural and power¬ 
ful the paradigm is. 

Once we’re rewired for graphs, we find that 
graph databases are also able to scale to solve prob¬ 
lems in ways other data models can’t. For example, 
the time required to traverse and return a subset of 
nodes and relationships remains proportional to 
the resultant set size (i.e. constant), even as the 
overall database grows rapidly. Attempts to repli¬ 
cate a graph within relational databases can work at 
trivial sizes, but performance quickly degrades and 
becomes unpredictable as the data set size or num¬ 
ber of joins increases. 

Relational databases and key-value stores aren’t 
going away anytime soon, but they are impractical 
for connected data. Graph databases are ideal to help 
solve sophisticated data problems found in social 
relationships and highly interconnected systems. It is 
a new (old!) way of solving problems, and we’re find¬ 
ing many innovative ways of applying it every day. I 


Emil Eifrem is founder 
of the Neo4j graph 
database project, and 
CEO of Neo Technology. 


Graph databases are able 
to scale to solve 
problems in ways other 
data models can't. 


□ Find this story at 
http://sdt.bz/37139 
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Analyst View 

BY MARGO VISITACION 

Build your best mobile app 
w 


What better way to 
appease customers than 
to get them involved in 
the development process? 


□ Find this story at 
http://sdt.bz/37265 


need a mobile application NOW/ 
Does this sound familiar? Companies 
around the globe are driving application develop¬ 
ment teams to develop mobile apps faster than 
ever before. And we’re not talking about just any 
old mobile applications. Customers expect and 
demand usable, useful and engaging mobile expe¬ 
riences. They have a voice, and competition makes 
their opinions even more important. 

To make things more complicated, it’s no secret 
that mobile apps live and die by their ratings in an 
app store, as Jeffrey Hammond wrote recently in 
SD Times. When quality is low, the rating suffers. 
And when the rating suffers, the download rate 
suffers. With more than 500,000 mobile apps in 
the Apple App Store and more than 450,000 in the 
Google Play store, mobile developers can no 
longer afford to skimp on quality. 

For Forrester’s Mobile App 
Development Playbook, we 
asked mobile development ven¬ 
dor and user firms to share their 
processes, tools and best prac¬ 
tices when it comes to successful 
mobile applications. The (not-so- 
secret) secret? Good-quality 
applications make happy customers, and happy 
customers differentiate you from the competition. 

So how can developers ensure the quality of 
their mobile applications and optimize testing 
strategy? In the Playbook, we recommend the fol¬ 
lowing best practices: 

Prioritize testing in the real world. Brands 
live and die by how their customers view their 
apps. What better way to appease them than to get 
them involved in the development process? 
Whether this involves location-based testing, 
crowdsourcing or traditional feedback, the cus¬ 
tomer is the one who will use the app, so the cus¬ 
tomer should have a say in the features and func¬ 
tionality. Customer involvement can include: 

1) Feedback: In an ideal world, companies would 
know what their customers want, when they want it 
and how they want it. So for mobile app developers, 
make each day a potential election: Customers who 
vote on which functions to include or features to 
drop can provide fantastic input. But it’s more than 
just building the right features: Customers now feel 


that they own a piece of the app. By allowing cus¬ 
tomers to test the features in which they are interest¬ 
ed, mobile app development teams can make loyalty 
a natural part of the development process. 

2) Location testing: Apps that have location- 
based components should be tested in different 
locations. This is especially important if you are 
building a navigation app that guides users to the 
nearest store or clinic, as mistakes in these apps 
can be costly and result in people getting lost. 

3) Crowdsourcing: It’s time for developers to 
take testing to the crowd, allowing groups to report 
on their app experience. Crowdsourcing is also a 
great way to gather data on how your app performs 
in “real-world” 2G, 3G, even 4G environments. 

Embrace agile practices and build not only 
for today, but for tomorrow. Technology is 
always in a state of flux, but in the mobile space, the 
rate of change is even greater. The key to adapting 
to the world of mobile testing is agile. Built to sup¬ 
port the unknown, agile development methodolo¬ 
gies allows teams to incrementally build software 
while simultaneously learning from experience and 
meeting market demands. Why reinvent the wheel 
each time you build an application? Consider the 
new devices that will be released in six months. 
Design mobile software with change in mind. 

Mix virtual and physical testing. Neither pure¬ 
ly virtual nor purely physical testing works. Physical 
testing is time-consuming and requires numerous 
devices and plans, while virtual testing never gives a 
complete picture. By combining testing approaches, 
application development professionals can identify 
the majority of bugs, and then use physical testing to 
focus on user and preproduction concerns. 

Adopt automation practices. Manual testing 
only works when organizations start small, support¬ 
ing a limited number of devices and platforms. How¬ 
ever, as soon as their target audience expands, device 
support must expand as well. In order to reduce the 
time and cost associated with traditional testing, 
developers must start adopting automation practices. 
As one executive responsible for mobile testing told 
us, “We have always found automation difficult for 
our Web apps and have never had the incentive to 
introduce automation. But with mobile, we don’t 
have that option.” Automation is required from day 
one to fulfill the “release early and often” mantra. I 
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Industry Watch 


BY DAVID RUBINSTEIN 


Big Data, social, mobile got the money 


David Rubinstein is 
editor-in-chief of SD Times. 


Your applications will have to 
be able to run in the cloud, 
on multiple devices , and with 
a fantastic user experience. 


□ Find this story at 
http://sdt.bz/37273 


W hile most of the world’s economy continued 
to vacillate between slow growth and brink 
of recession during 2012, the technology sector 
seemed not to notice, as money flowed like wine 
for acquisitions and growth. 

Unlike past years, though, when infrastructure 
deals commanded major dollars, 2012 was led by 
social media. (One area of infrastructure that DID 
see investment was in the cloud, in building out 
the clouds themselves, and storage solutions to 
hold all that cloudy stuff.) 

Despite Facebooks IPO flop (it fell from an 
opening price of US$43 per share into the teens 
before rallying back to near $30), 
the company did manage to find 
$741 million amid the rubble to 
buy image service Instagram. 

Meanwhile, Microsoft also 
spent on social computing, 
shelling out $1.2 billion for social 
networking software Yammer. It 
then spent the better part of the second half of 2012 
laying out its vision for putting Yammer into all its 
software and redefining how people will work. 

Google, which surpassed Microsoft for second 
place in market capitalization among technology 
companies (but remains behind Apple), made $300 
million in investments in 2012 through its Google 
Ventures arm, with 32% of its investments going into 
mobile, and 31% going into the consumer Internet. 
And, notably, Google acquired social marketing soft¬ 
ware provider Wildfire for $350 million. 

In the mobile space, API management compa¬ 
ny Apigee paid an undisclosed sum for mobile app- 
payment processing technology from the Whole¬ 
sale Applications Community, a mobile software 


development organization. 

Big spending, though, wasn’t limited to social 
media and mobile. IBM made a dizzying number 
of deals in 2012, including the $1.3 billion it spent 
for talent management software maker Kenexa. 

The NoSQL and Big Data markets saw plenty of 
action, with IBM again involved. It bought discovery 
and navigation software provider Vivisimo to meld 
into its existing Big Data analytics software. Mean¬ 
while, MongoDB developer lOgen received funding 
from Intel Capital and Red Hat in November, after 
securing $42 million in a financing round led by New 
Enterprise Associates. lOgen reported that before 
the November stake, the company had raised more 
than $73 million since 2007. 

Also receiving an influx of cash was DataStax, the 
company built around support for NoSQL database 
Cassandra. Funding came in at $25 million from 
such companies as Crosslink Capital, Lightspeed 
Venture Partners, and Meritech Capital Partners. 

In November, Big Data analytics startup Sumo 
Logic managed to wrestle $30 million from 
investors. Accel Partners led the way, along with Sut¬ 
ter Hill Ventures and Greylock Ventures. Sumo Log¬ 
ic should now be better placed to compete with rival 
Splunk, which showed a 56% increase in license rev¬ 
enue for its fiscal third quarter. And the company 
also announced an expansion that will have it occu¬ 
pying the entire former Gallo Salame factory in San 
Francisco. Talk about a meaty business! 

So what does it mean for developers? It means 
what we’ve been reporting all year: Your applica¬ 
tions have to be able to run in the cloud, on multi¬ 
ple devices, and with a fantastic user experience. 
The tools and technologies to do this have arrived. 
Are your skills up to the task? I 


Events Calendar 




DATE 

SHOW 

CITY 

SPONSOR 

LINK 

Jan. 16-17 

Open Compute Summit 

Santa Clara 

Facebook 

opencompute.org/summit-2013 

Jan. 27-31 

Lotusphere 

Orlando 

IBM 

www.ibm.com/connect 

Jan. 31-Feb. 2 

Macworld/iWorld 

San Francisco 

IDG World Expo 

www.macworldiworld.com 

Feb. 15-16 

Snow*Mobile 2013 

Madison, Wise. 

Sapling Events 

snow-mobile.org 

Feb. 25-March 1 

RSA Conference 

San Francisco 

RSA 

www.rsaconference.com 

Feb. 26-Feb. 28 

Strata Conference 

Santa Clara 

O'Reilly Media 

strataconf.com 

March 3-6 

SPTechCon San Francisco 

San Francisco 

BZ Media 

www.sptechcon.com 

March 13-21 

PyCon Python Conference 

Santa Clara 

Python Software Foundation 

us.pycon.org/2013 


For a more complete calendar of U.S. software development events, see www.sdtimes.com/calendar. Information is subject to change. Send news about upcoming events to events@bzmedia.com. 
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Big Data gets real 
at Big Data TechCon! 


Discover how to master Big Data from real-world practitioners - instructors who 
work in the trenches and can teach you from real-world experience! 



Come to Big Data TechCon to learn 
the best ways to: 


• Collect, sort and store massive quantities of 
structured and unstructured data 


• Process real-time data pouring into your organization 


• Master Big Data tools and technologies like Hadoop, 
Map/Reduce, NoSQL databases, and more 


• Learn HOW TO integrate data-collection 
technologies with analysis and business-analysis 
tools to produce the kind of workable information 
and reports your organization needs! 


• Understand HOW TO leverage Big Data to help 
your organization today 



BigData 

— TECHCON 



April 8-10,2013 
Boston, MA 

www.BigDataTechCon.com 


Register Early and SAVE! 

A BZ Media Event 

Big Data TechCon™ is a trademark of BZ Media LLC. 
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Chairman’s Letter 


BigData 

— TECHCON 



Dear colleague, 

The amount of data our organizations collect is growing expo¬ 
nentially, and the more data we have, the greater the opportuni¬ 
ties we have to use that data information. Big Data can improve 
efficiency, reduce waste, empower employees and customers, in¬ 
crease competitiveness, and benefit the bottom line. 


The tangible benefits of Big Data analytics are 
well known. You can read about them in the IT 
press, and also in business journals and the daily 
newspaper. Many books have been published 
about the “why” of Big Data. Conferences de¬ 
voted to exploring the trends are happening 
everywhere. 



We understand the "why" of Big Data, the 
practical reasons for storing, searching, sharing, 
analyzing and reporting through these gigantic data sets. 


Alan Zeichick 


Big Data TechCon isn’t a “why” conference. It’s the HOW-TO 
conference for Big Data. Practical workshops. Technical classes. 
Thorough examinations of the real-world choices in storage, pro¬ 
cessing, analysis and reporting of Big Data information. Strategies 
for rolling out Big Data projects in your organization. 


Come to Big Data TechCon to learn HOWTO accommodate 
the terabytes and petabytes of data from your Web logs, social 
media interactions, scientific research, transactions, sensors and 
financial records. Learn how to index, search and summarize the 
Big Data. Learn how to empower employees, inform managers, 
reach out to customers. 


Big Data TechCon is technology-agnostic. The workshops and 
classes apply to Big Data in your data center or in the cloud, from 
hosted environments to your own servers. The sessions apply to 
relational databases, NoSQL databases, unstructured data, flat 
files and data feeds. 

The faculty have real-world experience that you can tap into, 
whether you use Java, C++, .NET or JavaScript; whether you like 
MySQL, SQL Server, DB2 or Oracle; whether you love or hate 
Hadoop, HBase, Cassandra or Pig; and whether you are looking at 
dozens of terabytes or hundreds of petabytes. 

Produced by BZ Media—publisher of SD Times, the leading 
magazine for software development managers—Big Data Tech¬ 
Con is the biggest, most info-packed, most practical HOW-TO Big 
Data conference in the world. No hype. All tech, all the time. 

During the three-day conference, choose from 40+ workshops 
and technical classes at all levels, from overview to intermediate 
to advanced. Of course, the technical classes and workshops are 
only part of the benefit of attending Big Data TechCon. 

Learn from the smartest, hardest-working faculty in the Big 
Data universe in a way you never could by reading a book or 
watching a webinar. Mingle with fellow attendees. Talk shop dur¬ 
ing meals and receptions. Be inspired by keynotes, be informed 
by general sessions, be impressed by the hottest Big Data tools in 
the Expo Hall. It’s all waiting for you. 

See you in Boston! 

Alan Zeichick 

Conference Chairman 


The H0W-T0 conference for Big Data 
and IT Professionals! 

• Learn tips, tricks and techniques that will make 
you your company’s Big Data Expert! 

• Discover how to master Big Data from real-world 
practitioners—instructors who work in the trenches 
and can teach you from real-world experience 

• Hear about other related technologies that can 
help you with your Big Data projects: the cloud, 
efficient storage and warehousing methods, 
and more 

• Come to Big Data TechCon to master 

Big Data—get practical answers to real problems, 
learn tangible steps to real-world implementation 
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Event 

Schedule 

Sunday, April 7 

4:00 pm-7:00 pm 

Registration Open 

Monday, April 8 

7:30 am-7:00 pm 

Registration Open 

7:30 am-8:30 am 

Morning Coffee 

8:30 am - 10:00 am 

Workshops 

10:00 am - 10:15 am 

Coffee Break 

10:15 am- 12:15 pm 

Workshops 

12:00 pm- 1:15 pm 

Lunch 

1:15 pm-3:00 pm 

Workshops 

3:00 pm-3:15 pm 

Coffee Break 

3:15 pm-5:00 pm 

Workshops 

5:15 pm-6:30 pm 

Lightning Talks 

Tuesday, April 9 

7:30 am-7:00 pm 

Registration Open 

7:30 am-8:30 am 

Morning Coffee 

8:30 am-9:30 am 

Technical Classes 

9:30 am-9:45 am 

Coffee Break 

9:45 am - 10:45 am 

Keynote 

11:00 am - 12:00 pm 

Technical Classes 

12:00 pm-6:30 pm 

Exhibit Hall Open 

12:15 pm - 12:45 pm 

Sponsored Classes 

12:45 pm - 1:45 pm 

Lunch Break 

1:45 pm-2:45 pm 

Technical Classes 

2:45 pm-3:30 pm 

Coffee, Ice Cream in Exhibit Hall 

3:45 pm-4:15 pm 

Sponsored Classes 

4:30 pm-5:30 pm 

Technical Classes 

5:30 pm-7:00 pm 

Networking Reception in Exhibit Hall 

8:00 pm-9:30 pm 

Fireside Chats 


Wednesday, April 10 

7:30 am-4:00 pm 

Registration Open 

7:30 am-8:30 am 

Morning Coffee 

8:30 am-9:30 am 

Technical Classes 

9:45 am - 10:30 am 

Keynote 

10:30 am-3:00 pm 

Exhibit Hall Open 

10:30 am - 11:00 am 

Coffee Break in Exhibit Hall 

11:00 am - 12:00 pm 

Technical Classes 

12:00 pm - 1:00 pm 

Lunch Break 

1:00 pm-2:00 pm 

Technical Classes 

2:00 pm-2:30 pm 

Coffee Break & Prizes in Exhibit Hall 

3:45 pm-4:45 pm 

Technical Classes 

4:45 pm 

Conference Closes 


BigData 

— TECHCON 



Special Events 

Monday, April 8 

5:15 pm -6:30 pm 

Lightning Talks 

Learn something new in a handful of short, targeted talks, 
PLUS names will be drawn for free giveaways. 

Tuesday, April 9 

9:45 am -10:45 am 

Keynote 

12:00 pm -6:30 pm 

Exhibit Hall Open 

Come visit the growing and evolving network of technical 
experts in our Exhibit Hall. 

2:45 pm-3:30 pm 

Coffee, Ice Cream in the Exhibit Hall 


5:30 pm -7:00 pm 

Networking Reception 
in the Exhibit Hall 

8:00 pm -9:30 pm 

Fireside Chats 

Wednesday, April 10 

9:45 am -10:30 am 

Keynote 

10:30 am -3:00 pm 

Exhibit Hall Open 

Come visit the growing and 
evolving network of technical 
experts in our Exhibit Hall. 

10:30 am -11:00 am 

Coffee Break in Exhibit Hall 

2:00 pm-2:30 pm 

Winner’s Circle prizes announced in Exhibit Hall 
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Workshops 

Overview 

A Case Study: Data Visualizations and Insight into How 
America Eats at Restaurants 

David Hwang 

This workshop will discuss the pains and lessons learned from 
building out a database that collects data from thousands of 
restaurants across North America on a nightly basis. The chal¬ 
lenges of integrating data from disparate Point of Sales systems, 
data cleaning, data normalization, ETL and generating insight 
will all be discussed. 

Additional topics include: 

• Why choose open source? 

• Why pick Mongo? 

• What’s up with ZFS? 

We’ll wrap up by covering lessons learned and what’s next 

Level: Overview 

Getting Started with Cassandra EM 

Ben Coverston 

Unless you have experience with Google BigTable, HBase or 
Cassandra, column-oriented databases are probably an enigma. 
Cassandra’s data model is both simple and powerful. 

It takes some time to get used to the differences between the 
relational model and Cassandra’s column-based model. 

Cassandra is not schema-less, but we do not model relation¬ 
ships in Cassandra either. Data Modeling in Cassandra usually 
consists of finding the best way to denormalize the data when 
you put the data in the database so that you can retrieve it quickly 
and efficiently. This workshop will prepare you for success when 
modeling your data. This tutorial will dive into Cassandra from a 
developer perspective and give you the tools you need to get 
started with Cassandra today. 

This workshop will cover: 

• An introduction to Cassandra in the context of relational 
databases and non-relational alternatives 

• Best practices for modeling your data in Cassandra 

• Cassandra Query Language (CQL version 3) 

• Wide and composite columns 

• Practical examples 

• Anti-patterns (things to avoid) 

For a more advanced look at Cassandra, attend the “Apache 
Cassandra — A Deep Dive” class. 

Level: Overview 

Hadoop Data Warehousing with Hive EES 

Dean Wampler 

In this hands-on workshop, you’ll learn how to use Hive for 
Hadoop-based data warehousing. You’ll also learn some tricks of 
the trade and how to handle known issues. 

We’ll spend most of the workshop using a series of hands-on 
exercises with actual Hive queries, so you can learn by doing. 
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We’ll go over all the main features of Hive’s query language, 
HiveQL, and how Hive works with data in Hadoop. We’ll also con¬ 
trast Hive with relational and non-relational database options. 

Hive is very flexible about table schemas, file formats, and 
where the files are stored. We’ll discuss real-world scenarios for 
the different options. We’ll briefly examine how you can write Java 
user defined functions (UDFs) and other plugins that extend Hive 
for data formats that aren’t supported natively. 

You’ll learn Hive’s place in the Hadoop ecosystem, such as how 
it compares to other available tools. We’ll discuss data organiza¬ 
tion and configuration topics that ensure best performance and 
ease of use in production environments. 

Side notes: This workshop is suitable for beginnerdata analysts 
and software developers. Bring your laptop pre-installed with a 
suitable secure shell (ssh) client, such as Putty for Windows. 

Mac OS and Linux systems come preconfigured with ssh. 

Some prior SQL experience will be assumed. 

Level: Overview 


Intermediate 

Introduction and Best Practices for Storing and Analyzing 
Your Data with Adobe Hive EES 

Mark Grover 

This workshop on Apache Hive will introduce Hive and the 
best practices for storage and data analysis in it. Hive is an open- 
source data-warehousing system based on top of Apache Hadoop 
which lets you query, mine and analyze the data stored in 
Hadoop clusters using familiar SQL-like queries. 

This workshop will go through a hands-on exercise on how 
users can use Hive queries to perform data analysis. Because not 
all analysis can be expressed using SQL-like queries, the work¬ 
shop will cover how to write, test and use User Defined Functions 
and User Defined Aggregate Functions in Hive. 

This workshop will then go through some of the best practices 
related to partitioning, bucketing and joining various datasets in 
Hive. 

You will also learn how to leverage other technologies in the 
Hadoop ecosystem, such as plugging in Map/Reduce scripts from 
Hadoop directly into their Hive queries, and how to integrate 
HBase with Hive to share data across the two systems. The 
workshop will wrap up with a question-and-answer session. 

Note: This hands-on workshop requires you to bring a laptop. 

It is also recommended to install an SDK beforehand (a link will 
be provided). 

Level: Intermediate 
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Workshops 

Hands-on Google BigQuery and Google Predictive API MS 

Lynn Langit 

In this workshop you will learn the what, why, how, when and 
where around two of Google’s cloud data offerings. Here you will 
learn about Google BigQuery and the Google Predictive API and 
why you might consider using either of these cloud services. You 
will learn how to write effective, performant and useful queries 
when using either of these services. You’ll also get to wire up the 
services via their exposed APIs. The course will be taught in Java. 
The workshop’s objectives include: 

• Understanding the capabilities (and limitations) of Google 
BigQuery and the Google Predictive API 

• Writing and executing queries against BigQuery and the Pre¬ 
dictive API 

• Understanding how to wire up the services into your appli¬ 
cation 

• Understanding costs involved in using these cloud services 
from Google 

Prerequisite knowledge: RDBMS administration or develop¬ 
ment. Data-mining experience is helpful but not required. You 
should know Java. 

Prerequisite software: You must have a Gmail account to sign 
up for these services from Google. 

Level: Intermediate 

Hands-on NoSQL for the DBA MS 

Lynn Langit 

In this workshop, you will learn the what, why, how, when and 
where around non-relational (NoSQL) database technologies. 
Here you will understand the different types of NoSQL datastor¬ 
age options that are in popular use now. These include graph 
databases such as Neo4j and Freebase, key-value stores like Dy- 
namoDB and Cassandra, document data stores such as Mon- 
goDB, and column data stores like Hadoop. You will learn about 
and work with both locally hosted versions of these data stores 
and cloud-based versions. Importantly, you’ll understand which 
types of data stores will best suit your business and technical re¬ 
quirements. 

The workshop’s objectives include: 

• Understanding the capabilities (and limitations) of NoSQL 
databases 

• Seeing an example of each type of NoSQL database 

• Understanding basic query technologies for NoSQL data 
stores by working with Hadoop (working with Map/Reduce, 
Pig and Hive) 

• Getting hands-on experience installing, querying and per¬ 
forming basic administration with MongoDB 

Prerequisite knowledge: RDBMS development or administration 
(SQL Server references will be used in this talk for comparison). 

Prerequisite software: The class will be taught in Windows; stu¬ 
dents can use a Mac, but step-by-step instructions will differ 
slightly. 

Level: Intermediate 


Introduction to Hadoop, Map/Reduce and HDFS for 
Big Data Applications MS 

Serge Blazhievsky 

This workshop will teach you how to solve Big Data problems 
using Hadoop in a fast, scalable and cost-effective way. It is de¬ 
signed for technical personnel and managers who are evaluating 
and considering using Hadoop to solve data-scalability problems. 

We will start with Hadoop basics and discuss best practices for 
using Hadoop in enterprises dealing with large datasets. We will 
look into the current data problems you are dealing with and po¬ 
tential use cases of using Hadoop in your infrastructure. The pres¬ 
entation covers the Hadoop architecture and its main 
components: Hadoop Distributed File System (HDFS) and 
Map/Reduce. We will present case studies on how other enter¬ 
prises are using Hadoop, and look into what it takes to get 
Hadoop up and running in your environment. 

Two case studies will cover near-real-time data-processing sce¬ 
narios and Hadoop cluster implementations for large clusters 
(2,000 to 4,000 nodes). The near-real-time case study can be used 
as guidance for building the infrastructure of near-real-time ar¬ 
chitecture. All components used in architecture are open source 
under the Apache license, and will provide cost-effective solu¬ 
tions for solving Big Data problems. 

By attending this tutorial, you will: 

• Understand Hadoop’s main components and architecture 

• Be comfortable working with Hadoop Distributed File System 

• Understand Map/Reduce abstraction and how it works 

• Understand components of a Map/Reduce job 

• Know best practices for using Hadoop in the enterprise 

We will demonstrate the real-life code for basic Map/Reduce 

jobs and code for working with HDFS during the workshop. 

Level: Intermediate 

Advanced 

Crash Course in Machine Learning MS 

Dean Wampler 

Machine learning is a broad topic with many categories of 
problems and approaches, as well as lots of special-purpose tricks 
of the trade. We can’t cover all that, of course, but we can give you 
a taste of the most common problems addressed by machine 
learning and some of the techniques used to address them. 

We’ll discuss supervised learning, where a system is trained 
with data that has already been classified, then use that system 
to classify new data. We’ll use the particular example of classifying 
spam vs. non-spam e-mails. Classification applies a set 
of labels to data. We’ll experiment with a SPAM classifier. Finally, 
we’ll finish this section with a brief discussion of regression, 
where a value for a continuous number is assigned to data, such 
as predicting housing prices from historical data. 

How does Netflix determine what movies to recommend to 
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Workshops 

you? How does Amazon know what products you might want? 
Well examine recommendation engines that compare either user 
preferences or item features to make recommendations for you. 
Well experiment with a movie-recommendation engine. 

Clustering is an example of unsupervised learning, where we 
don’t train the system in advance. Instead, the system finds 
structure in the data “on its own.” Well look at two examples: 
k-means clustering, where you find k clusters and their centers in 
a dataset; and nearest-neighbors, where you find the nearest 
neighbors to a given data point in an efficient way. Well experi¬ 
ment with k-means. 

Well finish with a brief discussion of topics you might pursue 
on your own; the importance of data preparation, probabilities 
and statistics; and more advanced machine-learning concepts, 
such as probabilistic graphical models and neural networks. 

Side notes: This workshop is good for intermediate to ad¬ 
vanced developers and data analysts. Some programming ability 
will be assumed, such as using a text editor, writing simple scripts 
in some language, and using simple Linux “shell” commands. 
Level: Advanced 

Hive, Pig, Cascading and Codd: A Crash Course in 
Map/Reduce Relational Languages via an 
Appeal to History ESZ3 

Daniel Eklund 

This workshop will teach you how to simultaneously imple¬ 
ment the relational operators as defined by famed computer sci¬ 
entist E.F. Codd in Hive, Pig and Cascading. You, the developer, 
will focus on the abstract concepts of the Relational Algebra in 
order to learn ALL the languages simultaneously. Theory can 
sometimes be dry, but, in this case, revisiting Codd’s original in¬ 
tentions — and his seminal papers —can accelerate our learning 
of these new Big Data Relational Languages: 

• What are the Relational Algebra operators? Select, project, 
join, cross, union, intersect and divide 

• What is the Pig, Hive and Cascading syntax for these operators? 

• How do high-level languages and libraries like Pig and Hive 
compile these operators to Map/Reduce? 
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• Why do the syntax of Pig, Hive and Cascading differ, and 
what are each trying to emphasize or de-emphasize? 

• Running Exercises: Practice HiveQL, Pig and 
Cascading/Cascalog/Scalding concepts as they are intro¬ 
duced. 

• When to use Pig, Hive or Cascading over another 

• Code and examples of each language 

The following prerequisites ensure that you will gain the maxi¬ 
mum benefit from the class: 

• Programming experience: This is a developer’s course. We 
will write Hive, Pig, Cascading/Scalding/Cascalog applica¬ 
tions. Prior programming experience is recommended. 

• Linux shell experience: Basic Linux shell (bash) commands 
will be used extensively. Some prior experience is recom¬ 
mended. 

• Experience with SQL databases: SQL experience is helpful 
for learning these languages, but not essential. 

The main format of this workshop will be follow-along, al¬ 
though all code will be provided in case you want to code simul¬ 
taneously. You may log into remote EMR instances to build, test 
and run the applications if you wish, but it will be on your ac¬ 
count, and the instructor may not be able to help you with debug¬ 
ging. You will also be provided with all the exercise software, so 
you can view it on your laptop if desired. 

Level: Advanced 
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Technical Classes 

Overview 

A Simple Mongo Application Implemented and Compared 
in Node.js, SpringData and Ruby 

Andrew C. Oliver 

One simple app, three different technologies deployed locally 
to two different clouds. The application is executed, the source is 
examined, the approach is compared, the tools are demonstrated, 
and your questions are answered. 

Level: Overview 

Beyond Map/Reduce 

Dean Wampler 

Apache Hadoop is the current darling of the Big Data world. At 
its core is the Map/Reduce computing model for decomposing 
large data-analysis jobs into smaller tasks, and distributing those 
tasks around a cluster. Map/Reduce itself was pioneered at Google 
for indexing the Web and other computations over massive 
datasets. 

The strengths of Map/Reduce are cost-effective scalability and 
relative maturity. Its weaknesses are its batch orientation, making 
it unsuitable for real-time event processing, and its difficulty of 
implementing data-analysis idioms in the Map/Reduce comput¬ 
ing model. 

We can address the weaknesses in several ways. First, higher- 
level programming languages, which provide common query and 
manipulation abstractions, make it easier to implement Map/Re¬ 
duce programs. However, longer term, we need new distributed 
computing models that are more flexible for different problems 
and which provide better real-time performance. 

We’ll review these strengths and weaknesses of Map/Reduce 
and the Hadoop implementation, then discuss several emerging 
alternatives, such as Cloudera’s Impala system for analytics, 

Google’s Pregel system for graph processing, and Storm for event 
processing. We’ll finish with some speculation about the longer- 
term future of Big Data. 

This class is good for developers, data analysts and managers, 
but people with Hadoop and/or programming experience will get 
the most out of it. 

Level: Overview 

Building Successful Data Science Teams 

Dan Mallinger 

Come to this class and join a lively conversation on Big Analyt¬ 
ics. We’ll discuss current trends in the field, including the new role 
of Data Science beyond analytics, and how companies are em¬ 
bracing new technologies and approaches. 

We will frame the discussion with a client use case, covering 
objectives, approaches and challenges encountered. Through this 
lens, we’ll explore how organizations and individuals can transi¬ 
tioning skills into the Big Data space from more traditional roles, 
hiring strategies and team structures. Finally, we will discuss the 


BigData 

— TECHCON 



EMI This icon indicates code will be shown in a session. 

patterns that define a mature Data Science team and how firms 
will grow to assess and evaluate these teams. 

While our conversation will be technology-agnostic, we will 
focus particular attention to data science within the Hadoop 
ecosystem. 

What you’ll learn: 

• Best practices in Data Science projects 

• Common technologies and tools 

• Training and hiring strategies 

• Best team structures 

• The current state of the field 

• KPIs that define a mature Data Science practice 
Level: Overview 


How to See and Understand Big Data 

Jock Mackinlay 

Visual analysis is an iterative process that exploits the power of 
the human visual system to help people work with all kinds of 
data. When data is big, people must overcome the challenges of 
wide data, tall data, and data from multiple sources, often coming 
in fast and furiously. Attend this class to learn how people work¬ 
ing with data can address these challenges. The key technique is 
to use multiple coordinated views of data during visual analysis 
and storytelling with data. 

You’ll learn: 

• What research and practice have taught us about designing 
great visualizations and dashboards. 

• Fundamental principles for designing effective coordinated 
views for yourself and others. 

• How to systematically analyze data from multiple databases 
using your visual system. 

The instructor works for Tableau Software, a provider of data 
visualization solutions. 

Level: Overview 

Microsoft’s Big Data Story 

Lynn Langit 

Big Data is hot, but with so much press covering open-source 
projects, what does Microsoft have to offer in this area? More than 
you might expect! From extensions to core products, such as the 
addition of columnstore indexes in SQL Server 2012, to entirely 
new products, such as Data Explorer, you will learn about the 
“Microsoftification” of current Big Data offerings, such as 
“Hadoop on Azure.” You’ll leave this class understanding how you 
can best leverage your existing investment in Microsoft technolo¬ 
gies to take advantage of Big Data opportunities. 

Level: Overview 
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Technical Classes 

Untangling the Relationship Hairball with a 
Graph Database EM 

Max De Marzi 

Not only has data gotten bigger, it’s gotten more connected. 

Make sense of it all and discover what these Big Data connections 
can tell you about your users and your business. Come to this 
class to learn some of the different use cases for graph databases, 
and how to spot the non-obvious opportunities in your data. 

Level: Overview 

Visualizing Your Graph EM 

Max De Marzi 

Attend this class to learn how to see graphs conceptually and 
visually. Learn to think of your domain as a graph, and visualize it. 
See just one node, a handful of relationships, a sub-graph, or 
everything all at once. You’ll learn how to create a pretty picture of 
your friends’ friendships, see clusters of nodes congregate, and 
get lost in a 3D graph in this class. 

Level: Overview 

Intermediate 

Analytics Maturity Model EH 

John A. De Goes 

Every company is at a different stage in leveraging analytics to 
improve their operational efficiency and product offerings. In this 
class, you will learn an eight-stage analytics maturity model that 
companies can use to determine how far they are from the most 
analytical companies. 

Level: Intermediate 

Apache Cassandra—A Deep Dive EM 

Ben Coverston 

Recently, there has been some discussion about what Big Data 
is. The definition of Big Data continues to evolve. Along with vari¬ 
ety, volume and velocity (which the usual suspects handle well), 
other facets have been introduced, namely complexity and distri¬ 
bution. Complexity and distribution are facets that require a dif¬ 



ferent type of solution. 

While you can manually shard your data (Oracle, MySQL) or 
extend the master-slave paradigm to handle data distribution, a 
modern big data solution should solve the problem of distribu¬ 
tion in a straightforward and elegant manner without manual in¬ 
tervention or external sharding. Apache Cassandra was designed 
to solve the problem of data distribution. It remains the best data¬ 
base for low latency access to large volumes of data while still al¬ 
lowing for multi-region replication. We will discuss how 
Cassandra solves the problem of data distribution and availability 
at scale. 

This class will cover: 

• Replication • Data Partitioning 

• Local Storage Model • The Write Path 

• The Read Path • Multi-Data-Center Deployments 

• Upcoming Features (1.2 and beyond) 

For the most benefit from this class, attend the “Getting Started 
with Cassandra” workshop. 

Level: Intermediate 

A Survey of Probabilistic Data Structures EM 

Jim Duey 

Big Data requires Big Resources, which cost Big Money. But if 
you only need answers that are good enough, rather than pre¬ 
cisely right, probabilistic data structures can be a way to get those 
answers with a fraction of the resources and cost. 

This class will teach you about different data structures, give 
some theory behind them, and point out some use cases. You will 
learn how those structures can be used for common tasks, includ¬ 
ing counting the unique items in a collection, determining if an 
item is in a collection, and counting the occurrences of items in a 
collection. If you want Big Data without spending Big Money, this 
class is for you! 

Level: Intermediate 

Building an Impenetrable ZooKeeper EH 

Kathleen Ting 

Apache ZooKeeper is a project that provides reliable and 
timely coordination of processes. Given the many cluster re¬ 
sources leveraged by distributed ZooKeeper, it’s frequently the 
first to notice issues affecting cluster health, which explains its 
moniker: “The canary in the Hadoop coal mine.” 

Come to this class and you will learn: 

• How to configure ZooKeeper reliably 

• How to monitor ZooKeeper closely 

• How to resolve ZooKeeper errors efficiently 

Culling from the diverse environments we’ve supported, we will 
share what it takes to set up an impenetrable ZooKeeper environ¬ 
ment, what parts of your instrastructure specifically to monitor, 
and which ZooKeeper errors and alerts indicate something seri¬ 
ously amiss with your hardware, network or HBase configuration. 
Level: Intermediate 
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Technical Classes 

Building Applications Using HBase EM 

Amandeep Khurana 

Apache HBase is one of the popular new-age scalable NoSQL 
databases, and has seen a significant increase in production use 
cases over the last couple of years. HBase is an open-source ver¬ 
sion of Google BigTable. 

This class will teach you the basics of HBase, including its ar¬ 
chitecture, data model, and how HBase is different from tradi¬ 
tional database systems in terms of design assumptions and also 
with respect to building applications. 

Level: Intermediate 

Distributed Search and Real-Time Analytics, Parts I & II EM 

Jason Rutherglen and Ryan Tabora 

In this hands-on, two-part class, you will learn the importance 
of distributed search from the instructors’ industry experience 
and knowledge of real-world use cases. We’ll introduce different 
architectures that incorporate distributed search techniques, and 
share pain points experienced and lessons learned. Building on 
that, we’ll depict the landscape of distributed search tools and 
their future directions. 

For the hands-on part, you will learn how to install and use 
Apache Solr for real-time Big Data analytics, search, and reporting 
on popular NoSQL databases. You’ll also learn some tricks of the 
trade and how to handle known issues. 

We will spend around 30 minutes providing some background 
and use-case information on distributed search. We’ll then go 
through finer technical details with exercises every 10 minutes or 
so. We will go through the code on stage, but you can follow along 
with the EC2 instances that will be set up. 

Prerequisites: You should be familiar with Java and Unix shell 
commands. Some prior familiarity with SQL and Big Data solu¬ 
tions like Hadoop/Hbase/Cassandra/Solr would be helpful but 
not required. 

Level: Intermediate 

Extending Your Data Infrastructure with Hadoop EM 

Jonathan Seidman 

Hadoop provides significant value when integrated with an ex¬ 
isting data infrastructure, but even among Hadoop experts there’s 
still confusion about options for data integration and business in¬ 
telligence with Hadoop. This class will help clear up the confu¬ 
sion. You will learn: 

• How can I use Hadoop to complement and extend my data 
infrastructure? 

• How can Hadoop complement my data warehouse? 

•What are the capabilities and limitations of available tools? 

• How do I get data into and out of Hadoop? 

• How can I use my existing data-integration and 
business-intelligence tools with Hadoop? 

• How can I use Hadoop to make my ETL processing more 
scalable and agile? 

We’ll illustrate this with an end-to-end example data flow using 
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open-source and commercial tools, showing how data can be im¬ 
ported and exported with Hadoop, ETL processing in Hadoop, 
and reporting and visualization of data in Hadoop. You will also 
learn recent advancements that make Hadoop an even more 
powerful platform for data processing and analysis. 

Level: Intermediate 


Getting Started with Predictive Modeling: Simple Models 
and Basic Evaluation 

Claudia Perlich 

This class presents the basic steps of building a predictive 
model, and encourages you to get your feet wet on a small prob¬ 
lem using Excel and Weka to build a model. 

You will learn about selection training examples, constructing 
an appropriate feature vector, some basic feature selection, 
building models using three to four different machine-learning 
techniques (logistic regression, decision trees and rule induc¬ 
tion), and evaluating the resulting models using cross validation 
and a separate test set. 

Level: Intermediate 


Hadoop Backup and Disaster Recovery 101 EM 

Easier Aziz 

Any production-level implementation of Hadoop must have its 
data protected from threats. Threats to data integrity can be 
human-generated (malicious/unintentional) or site-level (power 
outage, flood, etc.). As soon as you start to identify these threats, 
it’s important to develop a backup or disaster-recovery solution 
for Hadoop! 

In this class, you will learn the unique considerations for 
Hadoop backup and disaster recovery, as well as how to navigate 
the common issues that arise when architects and developers 
look to protect the data. We’ll cover 

How to model your backup/disaster-recovery solution, consid¬ 
ering your threat model and specifics around data integrity, busi¬ 
ness continuity, and load balancing. 

Best practices and recommendations, highlighting Hadoop in 
contrast to traditional SAN/DB systems; replication versus “tee¬ 
ing” models for ensuring DR; replication scheduling; Hive; HBase; 
managing bandwidth; monitoring replication; using one’s sec¬ 
ondary beyond replication; and a survey of existing tools and 
products that can be used for backup and DR 

After taking this class, you should be able to explain to your or¬ 
ganization the right way to effect a backup or data-recovery solu¬ 
tion for Hadoop. 

Level: Intermediate 
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Technical Classes 

Hadoop by Example MS 

Serge Blazhievsky 

This class is designed to demonstrate the most commonly 
used Map/Reduce design patterns for various problems. Perform¬ 
ance and scalability will be taken into consideration. 

The class will present a general overview of the problems that 
can be solved using Map/Reduce, scalability and performance 
tuning for clusters of different sizes. The techniques described 
here can be used on all Hadoop distributions. 

The following technical problems will be covered: 

• “Hello world!” of the Map/Reduce universe—a word-count 
example 

• Mapping only Map/Reduce jobs and their usage for ETL-type 
jobs 

• Global sorting techniques 

• Sequencing files and its usage in Map/Reduce jobs 

• Mapping files and its usage in Map/Reduce jobs 

• Reduce-side join and its advantages and limitations 

• Map-side join and its advantages and limitations 

Each technique will be provided with a code example that can 
be used as a template. No prior knowledge about the topic is re¬ 
quired; however, some Java knowledge is recommended. 

Level: Intermediate 

Hadoop Programming with Scalding MS 

Dean Wampler 

Scalding is a Scala API for writing advanced data workflows for 
Hadoop. Unlike low-level APIs, it provides intuitive pipes and filters 
idioms, while hiding the complexities of Map/Reduce program¬ 
ming. Scalding wraps the Java-based Cascading framework in Func¬ 
tional Programming concepts that are ideal for data problems, 
especially the mathematical algorithms for machine learning. 

This class will use examples to demonstrate these points. De¬ 
velopers will see that Scalding is an ideal tool when they need a 
more full-featured and flexible tool set for Big Data applications, 
beyond what Hive or Pig can provide. 

Level: Intermediate 
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Hands-on Hadoop for Developers—Understanding 
Map/Reduce MS 

Lynn Langit 

In this class, you’ll get hands-on experience working with a 
leading implementation of Hadoop: Cloudera’s CDH4 release. 
Well work with their the company’s virtual machine and concen¬ 
trate on learning how to get business value from a Hadoop cluster 
of data by writing increasingly complex queries. These queries are 
known as Map/Reduce jobs, and we will write them in Java. Even 
if you aren’t a Java pro, you can still get value from this class, as we 
will also work with higher-level languages, such as Hive’s HQL 
and Pig. You’ll finish this class understanding when and how to 
get value from a Hadoop cluster. 

Prerequisite knowledge: Database development using T-SQL; 
Java or C# helpful, but not required. 

Prerequisite software: Cloudera Hadoop VM v CDH4. A link 
will be provided for download. 

Level: Intermediate 

Hands-on MongoDB MS 

Lynn Langit 

Come to this class to learn how to use MongoDB, one of the 
most popular NoSQL databases. Together we will install and use 
MongoDB so that you can start understanding which business 
scenarios are a fit for this type of data storage. After setup, you’ll 
learn basic administration, such as data inserts, indexing and 
more. Then you’ll get experience with the various ways to query a 
MongoDB database, using GUI tools such as MongoVUE, via the 
native query language, Map/Reduce, and the recently released 
aggregation framework. After this class, you will know how to use 
MongoDB in your projects. 

Prerequisite knowledge: some RDBMS admin or development 
is helpful, but not required. 

Setup software: The class will be taught on Macs, but you can 
implement on Windows if you wish. 

Level: Intermediate 

HBase Schema and Table Design Principles MS 

Amandeep Khurana 

More and more companies are adopting the open-source 
Apache HBase as the back-end database for Big Data applications 
that have very large tables - billions f rows, millions of columns. 
Come to this class to learn how to design tables to leverage the 
design principles and features of HBase. 

HBase tables are very different from relational database tables, 
and there are several concepts at play that you need to keep in 
mind while designing them. This class will introduce you to the 
basics of HBase schema and table design and how to think about 
tables when working with this popular database. 

Level: Intermediate 
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Implementing a Real-Time Data Platform Using HBase EH 

Ravi Veeramachaneni 

Apache HBase is a distributed, column-oriented data store for 
processing large amounts of data in a scalable and cost-effective 
way. Most of the developers, designers and architects have lots of 
background or experience in working on relational databases 
using SQL; the transition to NoSQL is challenging and oftentimes 
confusing. 

Come to this class to learn what and what not to consider 
when using HBase while implementing a real-time data-manage- 
ment system for your organization. The focus will be on sharing 
knowledge and real-world experiences to help the audience un¬ 
derstand and address the full spectrum of technical and business 
challenges. We will offer recommendations and lessons learned in 
terms of scalability, performance, reliability and transitioning to 
the new platform. 

You will learn the major considerations for infrastructure, 
schema design, implementation, deployment and tuning of HBase 
solutions. At the end of the class, you will be in a better position to 
make the right choices on applicability, design, tuning and infra¬ 
structure selection for a real-time HBase-enabled data platform. 
Level: Intermediate 

Map/Reduce Tips and Tricks EH 

Boris Lublinsky 

This class will start with a short Map/Reduce architectural re¬ 
fresher, showing how it executes in Hadoop and what the main 
Map/Reduce components and classes are, which can be used for 
customizing an execution. We will then describe the most com¬ 
mon possible Map/Reduce customizations and the reasons for 
their implementation. 

The majority of time will be dedicated to going through the 
code examples, showing how to design and implement custom 
input/output formats, readers and writers, and partitioners. We 
will also show what to expect out of any customization. 

Time permitting, we will also talk about high-level Map/Re¬ 
duce frameworks, namely Apache Crunch. 

Level: Intermediate 

Mastering Sqoop for Data Transfer for Big Data EB3 

Jaroslav Cecho and Kathleen Ting 

Apache Sqoop is a tool for efficiently transferring bulk data be¬ 
tween Hadoop-related systems (such as HDFS, Hive and HBase) 
and structured data stores (such as relational databases, data 
warehouses and NoSQL systems). 

Even though Sqoop works very well in production environ¬ 
ments, come to this class to learn how to handle bulk-transfer 
challenges that many Sqoop users face. 

Also come to this class to learn about Sqoop 2, the next genera¬ 
tion of the tool which is currently in development! Sqoop 2 will ad¬ 
dress ease of use, ease of extension and security. Well talk about 
Sqoop 2 from both the development and operations perspectives. 
Level: Intermediate 
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NoSQL for SQL professionals 

Dipti Borkar 

With all of the buzz around Big Data and NoSQL (non-rela¬ 
tional) database technology, what actually matters for today’s SQL 
professional? Learn more in this talk about Big Data and NoSQL 
in the context of the SQL world, and get to what’s truly important 
for data professionals today. 

In this presentation you’ll learn: 

• The main characteristics of NoSQL databases 

• Differences between distributed NoSQL and relational data¬ 
bases 

• Use cases for NoSQL technologies, with real-world examples 
from organizations in production today 

Level: Intermediate 


Oozie: A Workflow Scheduler for Hadoop EH 

Boris Lublinsky 

This class will start with a discussion of Oozie’s role in the 
Apache Hadoop ecosystem and its relationship to other Hadoop 
platform components. We will then describe the most straightfor¬ 
ward use cases/examples where Oozie can be helpful or even 
necessary. From that we move to Oozie workflow specification: 
components, illustrated with examples for various Oozie actions 
(Java, Map/Reduce, Hive, Pig, Sqoop). 

Then we will describe Oozie architecture. We will present the 
main Oozie components, job life cycle, retry and recovery. We will 
demonstrate the Oozie management console in integration/com¬ 
plementary usage with other Hadoop GUI tools (Map/Reduce Ad¬ 
ministration, Task Tracker, NameNode viewer, Fair Scheduler 
Administration, and Log File View). 

We will then discuss Oozie job parameterization, expression 
language, job configuration, runtime artifacts placement, and job 
submission using the Oozie command-line utility (CLI). We will 
demonstrate how to check statuses or stop Oozie jobs from CLI. 
We will also briefly describe Oozie APIs (Java and REST), and pro¬ 
vide fragments of code. After that, we will present Oozie coordi¬ 
nator and discuss how Oozie allows expressing dependencies 
between jobs and groups of jobs using “Synchronous Datasets.” 
We will touch on Oozie SLA support. 

Then we will describe Oozie bundles and demonstrate the 
whole hierarchy of Oozie artifacts: from bundles to coordinator to 
workflows to actions to processes running on Hadoop clusters. In 
conclusion, we will discuss Oozie limitations and ways to over¬ 
come some of them (via extension and customization). We will 
also talk about new features in future versions. 

Level: Intermediate 


Real-Time Hadoop EH 

Boris Lublinsky and Michael Segel 

We will start with a short introduction of different approaches 
to using Hadoop in the real-time environment, including real¬ 
time queries, streaming and real-time data processing and deliv¬ 
ery. Then we’ll describe the most common use cases for real-time 
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queries and products implementing these capabilities. We will 
also describe the role of streaming, common use cases, and prod¬ 
ucts in the space. 

The majority of time will be dedicated to the usage of HBase as 
a foundation for the real-time data process. We describe several 
architectures for such implementation, as well as a high-level de¬ 
sign and implementation for two examples: a system for storing 
and retrieving images, and using HBase as a back end for Lucene. 
Level: Intermediate 

Running Mission-Critical Applications on Hadoop 

Dave Jespersen 

This class will look at what is involved when you move Hadoop 
from a lab environment to actual deployment in production. We 
will cover the critical enterprise-grade features like data integra¬ 
tion, data protection, business continuity, and high availability, and 
discuss the ways you can accomplish this in your environment. We 
will also identify potential stumbling blocks, identify what a plat¬ 
form can or can’t provide, and help determine the scope and level 
of customization necessary to make your deployment successful. 

At the end of the class, you will better understand how to 
move Hadoop from the test bed to production deployment, what 
is involved in the process, and how to run a mission-critical 
Hadoop environment. Where appropriate, there will be real- 
world examples. 

Level: Intermediate 
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Taming Elephants, Bees and Pigs - The Big Data Circus 

Asish Thusoo 

This class will discuss the reasons and motivations behind the 
Big Data revolution and how it has evolved from previous data- 
processing technologies. Hadoop’s technical advantages and em¬ 
phasis on scale over raw performance is primarily driven by the 
growth in variety of data sources, for example. 

Based on real-world experience while at Facebook, the instruc¬ 
tor will talk about some of the key challenges of scale and the evo¬ 
lution of these technologies out of necessity, starting with 
Hadoop, expansion with SQL on top, and adding microstrategy 
and business intelligence layers. This class will cover specific is¬ 
sues and solutions at Facebook, such as latency gaps in the infra¬ 
structure solved by caching results in the MySQL tier, and the 
investment made to build low-latency query engines on HDFS. 
This will lead to a discussion of business demands and the tech¬ 
nical responses. 

Finally, we will discuss the future of Big Data and how technol¬ 
ogy is continuing to simplify the process and become accessible 
for all. We are able to address noise in the data and use the cloud 
to simplify what to use and what not to use by hiding these tech¬ 
nologies behind a comprehensive data platform. 

Attend this class to learn about the issues that were encountered 
in the trenches at Facebook. They were addressed by the trailblaz- 
ers and can now be handled even by smaller, leaner companies. 
Level: Intermediate 


Setting up a Neo4j Graph Database Cluster 
on Amazon EC2 EM 

Max De Marzi 

Learn how to set up a Neo4j Enterprise Cluster on Amazon EC2 
to handle your very connected Big Data. Hints, tips and best prac¬ 
tices will help you get started with a proof of concept or full pro¬ 
duction system. The instructor works for Neo Technology, 
creators of the Neo4j Graph Database. 

Level: Intermediate 

Seven Deadly Hadoop Misconfigurations EM 

Kathleen Ting 

Misconfigurations and bugs break the most Hadoop clusters. 
Fixing misconfigurations is up to you! Attend this session to learn 
how to get your Hadoop configuration right the first time. In 
some support contexts, a handful of common issues account for a 
large fraction of issues. That is not the case for Hadoop, where 
even the most common specific issues account for no more than 
2% of support cases. Hadoop errors show up far from where you 
configured, making it hard to know what log files to analyze. It 
pays to be proactive. Come to this class! 

Level: Intermediate 


Advanced 

Big Data Science: Extracting Truth from Large, 
Multi-Structured Data, Parts I & II 

Dan Mallinger 

Come to this class for an overview of Big Data Science from 
planning to execution. We will start by covering the processes and 
best practices for successfully doing Data Science in a business 
setting. We will then review a use case and walk through the ex¬ 
ploratory to confirmatory modeling stages. Each use case in¬ 
cludes source code in R and Pig, with the goal of showing how to 
parallelize analysis in R over Hadoop. 

The use case will highlight the advantages (and difficulties) of 
working with multi-structured data through text analysis, linear 
models and more. A theme throughout will be the importance of 
triangulating around truth when problems are intractable. 

The following prerequisites ensure that you will gain the maxi¬ 
mum benefit from the class: 

Programming experience: Big Data is still the Wild West of 
technologies, and programming skills are required to wrangle and 
analyze Big Data. 

Analysis experience: Although this is not a statistics course, 
understanding the principles of analysis as well as research meth¬ 
ods will help in reapplying the lessons. 

Level: Advanced 
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Building High-Volume Web Applications MS 

Andrew Wilson 

Keeping a Web application fast under highly variable user load 
presents website developers with a frustrating problem: What 
happens when load spikes and the site can’t scale out? This often 
requires using a variety of techniques to ensure high perform¬ 
ance. We’ll cover how to manage fast, slow and very slow-moving 
data, as well as methods for ensuring fast client responses. 

Using Spring, Ehcache, jQuery, JSON and VoltDB, you will learn 
howto write high-throughput, low-latency Web applications that 
can keep up with heavy, unexpected load from the front to the 
back end. The instructor works for VoltDB. 

Level: Advanced 

Data Modeling and Relational Analysis in a NoSQL World EM 

Michael Miller 

The new wave of NoSQL technology is built to provide the flex¬ 
ibility and scalability required by agile Web, mobile and enter¬ 
prise applications. Interestingly, any system that supports 
chained Map/Reduce processing (specifically Map/Reduce-Map) 
fulfills the basic query requirements of a SQL engine. Therefore, 
we will work to help you bridge the gap between SQL, relational 
(big) data, and the brave new world of NoSQL. 

In this class, you will learn how to model real-world relational 
data in a modern document database. We next go on to compile 
various SQL operations (SELECT, SUM, AVG, JOIN, etc.) into ex¬ 
ceptionally simple Map/Reduce programs. We finish with a study 
demonstrating the performance, scalability and “time-to-value” 
benefits of this approach, specifically the pre-computation of ma¬ 
terialized views. 

The class will be a mix of chalkboard and interactive demon¬ 
strations. 

Prerequisites: Bring a laptop with a modern Web browser 
(Chrome, Safari or Lirefox). Previous experience with basic script¬ 
ing languages (e.g. JavaScript) is an advantage but not a require¬ 
ment. All data and code samples will be provided at the beginning 
of the class. 

Level: Advanced 
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Getting Started with R and Hadoop, Parts I & II MS 

Jeffrey Breen 

Increasingly viewed as the lingua franca of statistics, R is a nat¬ 
ural choice for many data scientists seeking to perform Big Data 
analytics. And with Hadoop Streaming, the formerly Java-only Big 
Data system is now open to nearly any programming or scripting 
language. This two-part class will teach you options for working 
with Hadoop and R before focusing on the RMR package from the 
RHadoop project. We will cover the basics of downloading and in¬ 
stalling RMR, and will test our installation and demonstrate its 
use by walking through three examples in depth. 

You will learn the basics of applying the Map/Reduce para¬ 
digm to your analysis and how to write mappers, reducers and 
combiners using R. We will submit jobs to the Hadoop cluster and 
retrieve results from the HDLS. We will explore the interaction of 
the Hadoop infrastructure with your code by tracing the input 
and output data for each step. Examples will include the canoni¬ 
cal “word-count” example, as well as the analysis of structured 
data from the airline industry. 

No specific prerequisite knowledge is required, but a familiar¬ 
ity with R and Hadoop or Map/Reduce is helpful. 

Level: Advanced 


High-Speed Data Ingestion with Sharded NewSQL 
Databases EM 

Andrew Wilson 

Websites and back-end data applications have grown to the 
point that the data coming in is too fast for traditional databases 
without causing significant latencies for the clients. In this ses¬ 
sion, we will look at how to take advantage of NewSQL sharding 
to consume data as fast as possible, and then extract it to a tradi¬ 
tional database in a transactionally secure manner. 

We’ll combine traditional slides with real code, demonstrating 
how to quickly import and extract from a high-speed NewSQL 
database to a slower-speed traditional SQL database. The instruc¬ 
tor works for VoltDB, which uses the NewSQL term to describe its 
VoltDB database. 

Level: Advanced 



How to Integrate Structured and Unstructured Data 
withAvro MS 

Serge Blazhievsky 

This class is designed to demonstrate how to use the Apache 
Avro Java serialization libraries with Hadoop frameworks to speed 
up integration of big volumes of unstructured data sets. 

The class will teach you about the Avro framework. You will 
learn about: 

• Avro schema definitions and data types 

• Avro record creation 

• How to create Avro schemas programmatically 

• How to sort records by setting a property in schema 

• How to read records from a file 

• Hadoop Map/Reduce integration withAvro data 
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• Advantages of using Avro data over flat files or map files 

• Specifics of the integrations with Mapper and Reducer code 

• Avro format for Map/Reduce result output 

• Cascading Map/Reduce jobs 

• Map/Reduce for converting flat files into Avro data 

We’ll use two real-life examples to demonstrate the advantage 
of using Avro versus regular files. 

Level: Advanced 

How to Fit a Petabyte in Apache HBase MM 

Jean-Daniel Cryans 

There are many ways to load a petabyte of data in HBase, and 
this class will show you the best approaches! We will first review 
the solutions that are commonly adopted instinctively, and show 
why they fail. This will help you understand the practicalities of 
HBase’s architecture. 

The first technique that will be taught is better schema de¬ 
signs—that is, how to create keys that won’t inflate the size of your 
data set. The second technique that will be presented is better 
management of the loading of the data through pre-splitting of 
the regions and tuning the cluster for that type of workload. 

Finally, we’ll show you how to use bulk loading will be dis¬ 
cussed as the best way to populate the database. The code that 
will be used for the demonstrations will be available on GitHub. 
Level: Advanced 

In-Database Predictive Analytics MM 

John A. De Goes 

Predictive analytics have long lived in the domain of statistical 
tools like R. Increasingly, however, as companies struggle to deal 
with exploding volumes of data not easily analyzed by small data 
tools, they are looking at ways of doing predictive analytics di¬ 
rectly inside the primary data store. 

This approach, called in-database predictive analytics, elimi¬ 
nates the need to sample data and perform a separate ETL 
process into a statistical tool, which can decrease total cost, im¬ 
prove the quality of predictive models, and dramatically shorten 
development time. In this class, you will learn the pros and cons 
of doing in-database predictive analytics, highlights of its limita¬ 
tions, and survey the tools and technologies necessary to head 
down the path. 

Level: Advanced 

Matrix Methods with Hadoop MM 

David Gleich 

Get a brief introduction to thinking about data problems as 
matrices, and then learn how to implement many of these algo¬ 
rithms in the Hadoop streaming framework. The data-as-matrix 
paradigm has had a rich history, and the point of this talk is to 
give folks some idea of which statistical algorithms are likely to be 
reasonably efficient in Map/Reduce, and which are probably not 
going to be so reasonable. This will involve a few ideas: 

• How to store matrix data, and the performance tradeoffs. 
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The idea is to take natural ways of looking at methods to 
store data and describe them as a way of storing a matrix. 
This gives some insight into how a method could be fast. 

• How to implement some basic matrix operations that form 
the basis of many numerical and statistical algorithms. 

• Problems we’ve run into working with matrix data, and how 
we’ve solved some of them. 

• Ideas for future platforms that are ideal for this case. 

• A concern about the numerical accuracy of Big Data. Most of 
the code samples will use Python interfaces for Hadoop 
streaming, such as Dumbo, mrjob or Hadoopy. 

Prerequisites: 

• You ought to have a rough handle on what a matrix represents. 

• Those with a linear algebra background will probably get 
more out of this talk, but I’ll explain any linear algebra prop¬ 
erty with a statistical or data-oriented analog. 

• You should know about how to use Map/Reduce or Hadoop, 
most of the code examples will use Hadoop streaming via 
Dumbo. 

Level: Advanced 


Selecting the Right Big Data Tool for the Right Job, 
and Making It Work MM 

Eddie Satterly 

This class will focus on a wide range of Big Data solutions from 
open-source to commercial solutions, and the specific selection 
criteria and profiles of each. As in all technology areas, each solu¬ 
tion has its own sweet spots and challenges, either in CAP theo¬ 
rem, ACID compliance, performance or scalability. This class will 
provide an overview of the technical tradeoffs for the list of solu¬ 
tions in technical terminology. Once the technical tradeoffs are 
reviewed, we will review the cost and value of open-source solu¬ 
tions versus commercial software, and the tradeoffs that folks 
must take to choose one over the other. 

The next phase will go into great detail on use cases for specific 
solutions based on real-world experience. All of the specific use 
cases have been seen firsthand from our customers. The solutions 
will be reviewed to the level of specific technical architecture and 
deployment details. This is intended for a highly technical audi¬ 
ence and will not provide any high-level material on the solutions 
discussed. It is assumed that you will have working knowledge of 
solutions such as SQL, NoSQL, distributed file systems and time 
series indexes. You should also have an understanding of CAP 
theorem, ACID compliance and general performance characteris¬ 
tics of systems. 

Level: Advanced 
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The Dark Sides of Predictive Modeling: 

Tricks and Pitfalls EM 

Claudia Perlich 

This class will teach advanced strategies to improve model per¬ 
formance, including multi-state modeling and advanced feature 
engineering. In addition, we will look at some of the most com¬ 
mon pitfalls: over-fitting and leakage. 

With the growing set of tools available for data storage, man¬ 
agement, consolidation and preprocessing, the art of exploratory 
data analysis is getting lost. Too often are we trying to make sense 
of data that is “cleaned” beyond recognition. It makes a major dif¬ 
ference whether missing values have been removed or replaced 
by something else. 

In the end, data analysts work with data that are not what they 
think they are; they are unaware of sampling biases or informa¬ 
tion “from the future” that really should not be allowed in the 
model. As a result, the models look initially very good, and fail 
when used in reality. We will also cover some strategies of manag¬ 
ing really large amounts of data, including strategic sampling and 
grouping examples. 

Prerequisites: You should have some experience in predictive 
modeling. 

Level: Advanced 

Using Apache HBase’s API for Complex Real-Time 
Queries EM 

Jean-Daniel Cryans 

Apache HBase is generally viewed as an augmented key-value 
store, but what does that mean really? In this class, you will ex¬ 
plore the basic functionalities (get, put, delete and scan), and 
then fast-forward to the good parts. You will see how to use the fil¬ 
ters in order to run more efficient scans, like running lightweight 
counts over millions of rows. 

The class will also cover the different comp are-and-swap (CAS) 
operations that HBase enables by being strongly consistent, like 
incrementing counters or appending data without having to 
bring the data back to the client. We’ll also cover more advanced 
features like using coprocessors. The code that will be used for 
the demonstrations will be available on GitHub. 

Level: Advanced 

Visualizing Big Data in the Browser EM 

Simon Metson 

Whether for exploratory investigation or final presentation, vi¬ 
sualization and data go hand in hand. We posit that visualization 
is in fact the vehicle by which data scientists, analysts and devel¬ 
opers identify and disseminate the true value in data. Is visualiza¬ 
tion technology keeping pace with recent explosive innovations 
in data management, storage and processing? In a word: yes. 

In this class, you will learn how to create custom, agile, fly¬ 
weight in-browser applications for data analysis and visualiza¬ 
tion. We will leverage modern browser tools such as D3, Cubism 
and Crossfilter to do client-side data processing and visualization 


in concert with server-side distributed Map/Reduce processing in 
a scalable document database. 

You will create a powerful, two-tier application that is served di¬ 
rectly to the browser from a distributed database, allowing you to 
deploy on cloud-hosting providers with no server install required. 

Requirements: The class will be a mix of lecture and demon¬ 
stration. Come prepared with a laptop, wi-fi and a modern Web 
browser (Safari, Chrome or Firefox) and you too will make data 
beautiful. 

Level: Advanced 

What You Can Do with a Tall-and-Skinny QR Factorization 
on Hadoop EM 

David Gleich 

A common Big Data-style dataset is one with a large number 
(millions to billions) of samples with up to a few thousand fea¬ 
tures. One way to view these datasets from an algorithmic per¬ 
spective is to treat them as a tall-and-skinny matrix. 

In this class, you will learn one of the most widely used matrix 
decomposition techniques: the QR factorization. 

First, you’ll see how to use the QR factorization to solve various 
statistical analysis problems on datasets with many samples. For 
instance, we’ll see how to compute a linear regression for such a 
problem, howto compute principal components, and how easily 
we can implement a new scalable Kernel K-Means clustering 
method. 

Next, you will learn how to compute these QR factorizations in 
Hadoop. We use an algorithm that maps beautifully onto the 
Map/Reduce distributed computational engine. It boils down to 
computing independent QR factorizations in each mapper and 
reducer stage. One challenge we faced was in improving the nu¬ 
merical stability of the routine in order to get a good estimate of 
the orthogonal matrix factor, which is most important for appli¬ 
cations that need a tall-and-skinny matrix singular value decom¬ 
position. 

Prerequisites: 

• You ought to have a rough handle on what a matrix repre¬ 
sents. 

• Those with a linear algebra background will probably get 
more out of this talk, but I’ll provide any linear algebra 
property with a statistical or data-oriented analog. 

• You should know about how to use Map/Reduce or Hadoop; 
most of the code examples will use Hadoop streaming via 
Dumbo in Python. 

• If you attend my other talk, “Matrix Methods with Hadoop,” 
you’ll get more out of this class, but it isn’t required. 

• If you are familiar with linear regression, principal compo¬ 
nents or Kernel matrices, you’ll get more out of this class. 

Level: Advanced 
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Basier Aziz 

Basier is currently an independent consultant and was 
previously Director of Product Management at Cloud- 
era. He served in product-management roles in data 
management for the past five years between Oracle 
and Cloudera, and has most recently worked on a number of 
Cloudera products both on open-source platforms (Flume, Hive) 
and on the enterprise side (Cloudera Manager). He TA’d a course 
for MIT Electrical Engineering and Computer Science students 
between 2006 and 2007. 

Serge Blazhievsky 

Serge is Principle Software Engineer at Nice Systems, 
and is an experienced developer and architect with a 
rich background in C++/Java and distributed systems. 
Nice Systems uses Hadoop infrastructure for various 
data-processing needs. His previous company used Hadoop in¬ 
frastructure for all reporting needs, and before that, Serge de¬ 
signed Hadoop infrastructure used for Internet crawling and 
Web-page analysis. Serge holds a Masters Degree in Computer 
Engineering from Santa Clara University. Serge is a regular con¬ 
tributor to various Hadoop conferences, including the Hadoop 
User Group at Yahoo, the creator of Hadoop. 

Dipti Borkar 

Dipti is Director of Product Management at Couch- 
base where she is responsible for the company’s flag¬ 
ship product, Couchbase Server, and works with 
customers and users to understand emerging require¬ 
ments for low-latency, scalable data stores. Dipti has deep techni¬ 
cal experience in the database industry having worked at IBM as 
a software engineer and Development Manager for the DB2 
server team, and then at MarkLogic as a Senior Product Manager. 
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Ben Coverston 

Ben currently helps coordinate the training and sup¬ 
port activities at DataStax. He has over 15 years of de¬ 
velopment experience, and has written code running 
on some of the largest travel websites in the world. He 
became interested in Big Data through his experiences in trou¬ 
bleshooting data-related problems in which the velocity and vol¬ 
ume of data exceeded the capabilities of a single machine. 



Jean-Daniel Cryans 

fean-Daniel works as a Software Engineer at Cloudera 
on the Storage team, where he works on making 
Apache HBase better. Previous to that, he worked at 
StumbleUpon where he also worked on HBase while 
maintaining its production deployment. Jean-Daniel enjoys 
teaching HBase to newcomers and old-timers alike in the open- 
source community, or by giving presentations at Big Data and 
Apache Hadoop-related conferences and meetups. He became a 
committer and PMC member on HBase in 2008 when he was still 
an undergrad student at ETS Montreal. 




John A. De Goes 

John is CEO and CTO of Precog, and is responsible for 
leading the design and development of the company’s 
data-warehousing and analysis platform. He has been 
working professionally in distributed systems design 
and development for more than a decade. 

Author of multiple best-selling technical books, and a major 
contributor to open source, John has an extensive background in 
scientific and distributed computing, and in large-scale analytics. 
John is a frequent and well-received speaker at industry events. 
Recent engagements include DataWeek Conference, Glue Confer¬ 
ence, Frontier Developers, and NEScala. 


Jeffrey Breen 

Jeffrey is the Principal of the Think Big Academy at 
Think Big Analytics. Jeffrey has been very active in 
local user groups, has taught and mentored through¬ 
out his career, and has presented talks recently on R 
and Hadoop to the Data Warehouse Institute, the Chicago Area 
Hadoop and R User groups, and the Boston Predictive Analytics 
Meetup. Jeffrey has also developed and delivered the RHadoop 
training course, as well as all materials for Revolution Analytics. 

Jaroslav Cecho 

Jaroslav works as a Software Engineer at Cloudera, and 
is a committer and PMC member in three top-level 
Apache projects: Sqoop, Flume and MRUnit. 




Max De Marzi 

Max is a Software Field Engineer at Neo Technology, 
where he built Neography Ruby Gem, a rest API wrap¬ 
per to the Neo4j Graph Database. He is addicted to 
learning new things, and he loves a challenge and 
finding (and sharing) pragmatic solutions. 

Jim Duey 

Jim is a Programmer at Lonocloud, and has been a 
professional programmer for more than 20 years (the 
last three at Clojure). He’s done work with embedded 
systems in multiple industries and languages like 
Forth, Delphi and C++. Jim’s blog on Clojure programming is at 
clojure.net. 
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Daniel Eklund 

Daniel is Principal Consultant for Think Big Analytics, 
and a software architect and technologist with over 15 
years of experience in enterprise software develop¬ 
ment. Daniel works with leading Fortune 500 customers to imple¬ 
ment Big Data/NoSQL storage and compute systems using 
Hadoop, Hive, Pig, Cascading, Cassandra and HBase. 
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Amandeep Khurana 

Amandeep is a Solutions Architect at Cloudera, where 
he is involved in building out solutions using compo¬ 
nents in the Hadoop ecosystem. He is also a co-author 
of the “HBase in Action” book. Prior to joining Cloud- 
era, Amandeep was at Amazon Web Services and was a part of the 
Elastic Map/Reduce team. 



David Gleich 

David is an Assistant Professor of Computer Science at 
Purdue University, and is interested in how we can uti¬ 
lize matrix algebra to express — and improve — algo¬ 
rithms in network analysis and data-based simulation 
analysis. David's research straddles a few different areas and often 
involves working with large datasets on high-performance com¬ 
puting architectures like MPI clusters, and data computing archi¬ 
tectures, such as Map/Reduce. 

Mark Grover 

Mark is a Software Engineer at Cloudera and a con¬ 
tributor to the Apache Hive open-source project. He is 
also a section author of O'Reilly's book on Apache 
Hive called “Programming Hive.'' Mark is an active re¬ 
in the Hive mailing list and IRC channel. 

David Hwang 

David is the Founder and Chief Scientist of Restaurant 
Sciences. In his current role, he and his research team 
analyze data from thousands of restaurants across 
North America, and deliver insights back to industry 
suppliers on America's eating and drinking patterns. Mongo, R, 
Mahout, Python, D3 and ZFS all comprise the technology base of 
his current project. He is a true testament for what can be accom¬ 
plished with sub-normal intelligence, open-source software and a 
penchant for hard work. 

Dave Jespersen 

Dave brings his deep engineering experience to his 
role of chief customer advocate at MapR Technolo¬ 
gies. He enriches the customer experience by working 
with MapR's customer base to develop and imple¬ 
ment innovative solutions to the complex problems faced by 
every enterprise. 

He was previously VP of Engineering at MapR, where he led the 
development of MapR's industry-leading products. Dave has 30 
years of successful enterprise software development experience 
in both small and large companies, including EMC, Sun Mi¬ 
crosystems, Sterling Software, Spectra Logic, Exabyte and DEC. 
Dave was educated at Brigham Young University, where he earned 
a BS M.E. and a minor in Computer Science. 





I Lynn Langit 

I Lynn is an independent consultant who specializes in 
I in database technologies (SQL and NoSQL, both on 
l!w premise and cloud-based). Lynn has published three 
\\ 11 books on SQL Server Business Intelligence, and has 
created a set of courseware to introduce children to programming 
at www.TeachingKidsProgramming.org. Lynn has been recog¬ 
nized by Google and lOGen for technical community contribu¬ 
tions. Google awarded her the Google Developer Expert award, 
and lOGen added her to the list of MongoDB Masters. Since leav¬ 
ing a four-year stint at Microsoft in October 2011, Lynn has been 
working on building Big Data and BI solutions with customers 
from the education, manufacturing and hospitality sectors. She 
has authored courseware on SQL Server 2012 for DevelopMentor. 
Read her blog at www.LynnLangit.com. 

Boris Lublinsky 

Boris is a principal architect with Nokia, where he is actively par¬ 
ticipating in Big Data, SOA, BPM and middleware implementa¬ 
tions. Prior to this, Boris was a principal architect at Herzum 
Software, where he designed large-scale SOA systems for clients; 
and an enterprise architect at CNA Insurance, where he was in¬ 
volved in designing and implementing CNA's integration strategy, 
building application frameworks and implementing service-ori¬ 
ented architectures. 

Boris has more than 25 years' experience in enterprise and 
technical architecture, and software engineering. He has more 
than 80 technical publications in different magazines, including 
Distributed Computing, Nuclear Instruments and Methods, lava 
Developer's Journal, XML Journal, Web Services Journal, and EAI 
Journal. Boris is also a co-author of “Applied SOA: Service-Ori¬ 
ented Architecture and Design Strategies'' and an SOA news edi¬ 
tor at Infoq.com. 
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Faculty 

Jock Mackinlay 

Jock is Tableau Software’s Senior Director of Visual 
Analysis. At Stanford University, he pioneered the au¬ 
tomatic design of graphical presentations of relational 
information. He joined Xerox PARC in 1986, where he 
collaborated with the User Interface Research Group to develop 
many novel applications of computer graphics for information 
access, coining the term “Information Visualization.” Much of the 
fruits of this research can be seen in his book, “Readings in Infor¬ 
mation Visualization: Using Vision to Think.” Jock has a Ph.D. in 
computer science from Stanford University. 

Dan Mallinger 

Dan is the Data Science Team Lead at Think Big Ana¬ 
lytics. His experience in Big Data and Hadoop has run 
the gamut of customer data, hardware networking 
data, social data, gaming data, and more. Dan’s work 
in these verticals includes research, analysis and parallelization of 
techniques. In prior lives, he has worked as a statistician in a vari¬ 
ety of settings—including research in K-12 education—has expe¬ 
rience architecting systems, and was briefly a high school math 
teacher. 
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largely using cloud and NoSQL technologies. He is also a well- 
known open-source advocate (having served on the board of the 
Open Source Initiative), was the founder of the POI project, and 
worked for successful startup JBoss before it was acquired by 
Red Hat. 


Claudia Perlich 

Claudia is Chief Scientist at Media6Degrees, a startup 
that specializes at targeted online display advertising. 
Claudia received her Ph.D. in Information Systems 
from the Stern School of Business at New York Univer¬ 
sity in 2005, and holds additional graduate degrees in Computer 
Science. Claudia joined the Data Analytics Research group at the 
IBM T.J. Watson Research Center in 2004, and continued her re¬ 
search on data analytics and machine learning for complex real- 
world domains and applications. 

She is the author or 50+ scientific publications; holds multiple 
patents in the area of machine learning; has won various data- 
mining competitions and best paper awards; and speaks regularly 
at conferences and other public events. In addition, she is teach¬ 
ing courses on Data Mining and Business Analytics at Stern, and 
gives guest lectures at Wharton, Columbia and MIT. 



Simon Metson 

Simon is an Engineer at Cloudant, and also a particle 
physicist. For the past 10 years, he worked on the dis¬ 
tributed computing system for one of the Large 
Hadron Collider experiments, aggregating petabytes of relational 
and non-relational data sources. In 2012, Simon joined Cloudant, 
where he designs software and creates educational material to 
help developers interact directly with massive datasets. 


Jason Rutherglen 

Jason is a senior architect at Think Big Analytics. He is co-author¬ 
ing the upcoming “Definitive Guide on Lucene and Solr” as well 
as “Programming Hive.” He has given several talks at conventions, 
including Strata and Cassandra Summit. He co-presented a tuto¬ 
rial on distributed search at Strata New York. He has taught the 
Map/Reduce/Hive/Pig Think Big Academy class through Think 
Big Academy. 



Michael Miller 

Mike is Chief Scientist at Cloudant, where he develops 
and evangelizes the company’s technical vision and 
manages long-term product R&D. While at MIT as a 
Postdoctoral Fellow, he cofounded Cloudant after cut¬ 
ting his teeth on petabyte-per-second problems at the Large 
Hadron Collider. Mike holds a B.S. in Physics and a B.A in Philoso¬ 
phy from Michigan State University, a Ph.D. in Physics from Yale 
University, and is an Affiliate Professor of Particle Physics at the 
University of Washington. He has more than a decade’s experi¬ 
ence as a builder of the most extreme Big Data systems on earth, 
as well as extensive experience lecturing on mathematics, 
physics, data science, and philosophy at the graduate and under¬ 
graduate level. 



Andrew C. Oliver 



Infoworld, 


Andrew is the president of Open Software Integra¬ 
tors, a US firm specializing in NoSQL/Big Data devel¬ 
opment with offices in Chicago and Durham, N.C. 
Andrew is the “Strategic Developer” columnist for 
which focuses primarily on application development, 



Eddie Satterly 

Eddie is Chief Big Data Evangelist at Splunk and has 
served in a variety of roles, including developer, engi¬ 
neer, architect and CTO over his 23 year career. He has 
been a long-time Big Data user, even before it was the 
cool thing to do. More recently, he was able to revolutionize the 
way a leading online travel agency delivers their core Web appli¬ 
cations that resulted in improved user experience. He created a 
highly scalable and flexible Big Data environment using best-in¬ 
breed tools, and as a result, was able to retire 35 other systems. 
Eddie has done guest lectures at universities, and presents at sev¬ 
eral conferences and symposiums yearly. He is a recognized ex¬ 
pert in the field of Big Data and has presented at many global 
conferences on the topic. Eddie has a B.S. in Computer Science 
from Indiana University. 
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Faculty 

Michael Segel 

Michael is a principal consultant with Think Big Analytics. As a 
principal, he is involved in working with clients, assisting with 
their strategy and implementation of Hadoop. Michael is also in¬ 
volved as an instructor with Think Big’s Academy, teaching 
courses on Hadoop Development in Java, Hive and Pig, along 
with HBase. 

Prior to joining Think Big, Michael ran his own consulting firm, 
developing solutions for customers around the Chicago area. 
Since 2009, Michael has been working primarily in the Big Data 
Space. He also founded the Chicago Hadoop User Group 
(CHUG). 

Michael received his bachelor’s degree in Computer Science 
from the College of Engineering at Ohio State University. 

Jonathan Seidman 

Jonathan is a Solutions Architect on the Partner Engi¬ 
neering team at Cloudera. Before joining Cloudera, he 
was a Lead Engineer on the Big Data team at Orbitz 
Worldwide, helping to build out the Hadoop clusters 
supporting the data storage and analysis needs of one of the most 
heavily trafficked sites on the Internet. Jonathan is also a founder 
and organizer of the Chicago Hadoop User Group and the 
Chicago Big Data Meetup, and a frequent speaker on Hadoop and 
Big Data at industry conferences such as Hadoop World, Strata 
and OSCON. 

Ryan Tabora 

Ryan is a data developer at Think Big Analytics. He has 
hands-on experience with Fortune-500 clients work¬ 
ing on Big Data solutions, including technologies like 
Hadoop, HBase, Hive, Cassandra and Solr. He is co¬ 
authoring the upcoming “Definitive Guide on Lucene and Solr.” 
He co-presented a tutorial on distributed search at Strata New 
York. He helped develop the coursework for the Think Big Acad¬ 
emy HBase and Solr class. 

AsishThusoo 

Asish, CEO and co-founder of Qubole, a pioneering 
| I Big Data startup, is also the co-creator of Apache Hive 
and served as the project’s founding Vice President at 
the Apache Software Foundation. He started his ca¬ 
reer as an engineer at Oracle, where he contributed heavily to 
many core components of Oracle RDBMS. Ashish also ran the 
Data Infrastructure team at Facebook, leading the team in the 
creation of one of the largest data-processing and analytics plat¬ 
forms in the world, a platform that achieved the bold aim of 
making data accessible to analysts, engineers and data scientists 
alike within the company. Ashish has a Bachelor’s degree in CS 
from IIT-Delhi, and a Master’s degree in CS from University of 
Wisconsin-Madison. 
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Kathleen Ting 

Kathleen is a Support Manager at Cloudera, and is a 
committer on the Apache Sqoop project, and has spo¬ 
ken at many Big Data conferences, such as Hadoop 
World on Map/Reduce; at HBaseCon on HBase; at 
Strange Loop on ZooKeeper; and at Hadoop Summit on Sqoop. 



Ravi Veeramachaneni 

Ravi is Principal Architect at RSA, a security division of 
EMC, where he architects and implements Big Data 
solutions for the Archer platform. Ravi is also an En¬ 
terprise Architect and Technology leader with 20+ 
years of software design, development and architecture experi¬ 
ence, specializing in e-business, service-oriented architecture, 
business-to-business integration, large-scale system design, 
cloud computing and Big Data. His experience spans a broad 
range of industries, including banking, insurance, supply chain, 
B2B e-commerce, geo-spatial, and retail. He has played various 
technology leadership roles at Discover Financial, Key Corp., 
Navteq, Nokia and Informatica, to name few. 

He has spoken at various industry events for the last couple of 
years, including Chicago Data Summit, Hadoop World, CAMP IT 
conference, Enterprise Data World, Informatica World, and Infor¬ 
matica User Groups. Ravi also frequently speaks at Navteq inter¬ 
nal TechTalks on a variety of technical topics. 

Dean Wampler 

Dean is Principal Consultant at Think Big Analytics, 
specialists in Big Data, particularly in using the 
Hadoop ecosystem of tools. He speaks frequently at 
conferences on various Big Data and other program¬ 
ming topics. Dean is the co-author of “Programming Hive,” the 
author of “Functional Programming for Java Developers,” and the 
co-author of “Programming Scala.” 

Andrew Wilson 

Andy is a Solutions Architect for VoltDB. He has been a 
i 7 software developer for 16 years, writing both desktop 
J V and Web applications. Most recently, Andy was the 
jM Software Architect for Harvard Business Publishing’s 
Higher Education e-commerce site, which sells course materials 
for MBA students around the world. He currently travels across 
the country speaking at Node.js meetup groups, most recently at 
QCon NY. He was also an adjunct professor at Rivier University 
teaching graduate computer science courses. 
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Hotel & Travel 

Big Data TechCon will be held at the Hyatt Regency Cambridge. 
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Hyatt Regency Cambridge, Overlooking Boston 
575 Memorial Drive 
Cambridge, MA 02139-4896 
Phone:+1-617-492-1234 
Fax:+1-617-491-6906 
www.cambridge.hyatt.com 

Reservations 

Special Discounted Rates 

Room rates for Big Data TechCon 
attendees are US$199 per night for 
single/double occupancy. 

Rooms for the reduced rate are limited and are available on a 
first-come, first-serve basis. 

Click here to make your hotel reservation 

(or use the “Make Hotel Reservation” option on 
the confirmation page of your registration). 

Reservations at the reduced rate can be made through 5:00 PM 
Eastern time on March 18, 2013—assuming they don't sell out. 
The number of rooms in the discounted block is LIMITED, and 
historically rooms sell out well before the deadline. Don't wait 
until the last minute to reserve your hotel rooms! 

This rate is available throughout Big Data TechCon. Those who 
reserve their hotel rooms via this reservation link (our room 
block) will receive: 

• Complimentary wireless Internet service in their rooms. 

• Overnight self-parking discounted to $25 per day, and daily 
self-parking discounted to $15 per day. 

Hotel Highlights 

Minutes from Boston, the Hyatt Regency Cambridge hotel is 
located along the scenic Charles River overlooking the Boston 
skyline, and is in the midst of two uncommonly exciting cities, 
Boston and Cambridge. Each exhibits a unique blend of 
old-world charm coupled with youthful, contemporary 
sophistication. Cambridge is the spirit side of Boston, just 
a bridge away on the historic side. 





Driving Directions 

From the West 

From Mass Pike: Take exit 18 - Allston/Cambridge (left-hand 
exit). Follow the signs for Cambridge. Cross the River Street 
Bridge. Exit right at the end of the bridge onto Memorial Drive. 
Hyatt is half a mile up on Memorial Drive (Route 3) on the left- 
hand side. Turn left at the traffic light to access the hotel entrance 
and parking garage. 

From the South 

From 1-93: Take exit 26 - Storrow Drive/Back Bay/Cambridge. 

Stay in the right lane (Storrow Drive). Go 3/4 of a mile and take 
the second exit on the left, Government Center/Kendall Square. 
Go up the ramp and stay in the right lane. Turn right at the stop 
sign. Go across the Longfellow Bridge. Take the first left off the 
bridge and turn onto Memorial Drive (Route 3). Stay on Memorial 
Drive for approximately 1 mile on the right. At the light, turn 
right. Hyatt is on your left. 

From the North 

From 1-93 or Route 1: Take exit 26 - Storrow Drive/Back Bay/ 
Cambridge and follow “From the South” directions. 

From Logan International Airport 

Follow signs to the Mass Turnpike: 90 Boston/Williams Tunnel. 
Pay toll upon exiting airport. You will be heading west on 90. 

From 90 West, take exit 20 - Brighton/Cambridge (pay another 
toll). Out of the tollbooth, bear right following the sign for the 
Cambridge/Somerville exit. From this exit ramp, stay straight 
through two lights and cross over the bridge. After the bridge, 
turn right at the light onto Memorial Drive. Stay in the left lane. 
Continue on the overpass in the left lane. At the first light on 
Memorial Drive, make a left and then another quick left into the 
Hyatt front circle. 

Current parking rates 

Self-Parking: $35 (plus 14% tax) per day (24 hours) 

Valet Parking: $42 (plus 14% tax) per day (24 hours) 
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Getting Approval 

Gotta Get Approval? Try These 11 Time-Tested Tactics: 

1. STUDY. Note the HOW-TO classes and workshops at Big Data I 
TechCon focused on the latest Big Data technologies, especially 
those that are best aligned with your company's existing IT infra¬ 
structure. Say that this is your first, and most practical, opportu- 
nity to bring Big Data to your business. 

2. PREPARE. Download the course catalog and circle the classes 

you want to take, and explain why the topics relate to your Big Data 
technical efforts. Show that you have found many sessions that fit ; 

your specific needs, and your company's strategic goals. 

i 

3. JUSTIFY. Go in armed with all the necessary materials to 
make a good case for how your attending Big Data TechCon will 
help your company make money, save money or improve produc¬ 
tivity by helping you capture and analyze the data that drives your 
business. 

i 

4. SHARE. Promise to come back from Big Data TechCon and 
hold a brown-bag lunch session to share what you've learned 

with your colleagues, or even conduct formal training within your : 
department. In fact, maybe you'll want to schedule a series of 
brown-bag lunches. 

5. PLAN ■ Tell management that after you attend Big Data Tech¬ 

Con, you'll make definite action plans and recommendations to 
implement new Big Data plans, and to improve how your com- ; 

pany uses all of the data it captures. 


Special Group 



6- RELATE. Show how problems or issues you've recently en¬ 
countered fit with the classes at Big Data TechCon, and discuss 
the types of technology discussions you'll have with the confer¬ 
ence faculty and other IT professionals. 

7. SAVE. The tuition and travel expense of attending Big Data 
TechCon is less than many other conferences. The earlier you sign 
up, the more you save, so explain the benefit of signing up early, 
and for booking your hotel room before the cutoff. 

8. TEAM . Save even more with group discounts. Send three or 
more employees from your company, and save $100 off per per¬ 
son. Each person can take different classes and bring back even 
more valuable tips and techniques. (Sending 10 or more? Contact 
us for arrangements.) 

9. GROUP. User groups, government employees, non-profits 
and professionals employed by or attending educational institu¬ 
tions can also receive special savings. Check the website or ask 
Stacy Burris (sburris@bzmedia.com) about custom options for 
your group. 

10. LAUNCH . Classes at Big Data TechCon help you get a jump- 
start on every aspect of Big Data that you have been talking about 
implementing (but haven't) for months. Whether it's Hadoop, 
graph databases, NoSQL or another new technology, explain that 
you'll find the answers here. 


11. DECIDE . While you can sign up anytime, your company 
will save the most if you beat the deadlines. Explain that you 
will help your company’s bottom line by signing up for 
Big Data TechCon today! 
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Register by 

Jan.11 

Register by 

Feb.22 

Register by 

March 22 

After 

March 22 

Three-Day Conference $1,095 

$1,195 

$1,295 

$1,495 

April 8-10 

SAVE $400 

SAVE $300 

SAVE $200 


Exhibit Hall Only 

FREE 

FREE 

FREE 

$50 

April 9-10 





Register Online TODAY at www.BigDataTechCon.com! 


Three-Day Conference 

Registration Includes: 

• Admission to workshops and 
technical classes on April 8, 9 and 10 

• Admission to keynotes 

• Admission to the Exhibit Hall 

• Admission to all special events, 
including the Networking Reception 

• Downloadable conference materials 

• Coffee breaks and lunch 
where indicated 

Exhibit Hall Only 

Registration Includes: 

• Admission to the Exhibit Hall 

• Admission to Networking Reception 


Big Data gets Real 
at Big Data TechCon! 



How to Register 

Register online and use one of the 
following payment methods: 

Credit Card. You can use the secure 
online form to pay via credit card and 
get immediate confirmation of your regis¬ 
tration. MasterCard, Visa and American 
Express are accepted. You'll receive a 
registration record and receipt. Please 
print out these pages and bring them with 
you to the Conference. Present them at the 
Registration Desk to pick up your badge 
and course materials. 

Check. Fill out the online registration 
form. Print out the registration record and 
receipt and mail them to BZ Media LLC, 

7 High Street, Suite 407, Huntington, NY 
11743, with your payment. Online registra¬ 
tions that are mailed without payment will 
not be confirmed until payment is received. 

Purchase Order. If you register using 
a P.O., you'll be invoiced immediately for 
the registration amount. Payment must 
be received before your registration can be 
confirmed. 

Special Discounts 

You may combine one of these special 
discounts with the Early Registration 
pricing to save even more! 

Group. Group discounts will be given 
automatically if you register three or more 
people at once. You can also contact Stacy 
Burris at sburris@bzmedia.com to receive 
the $100/person discount if your group 
is unable to register at the same time. 
Contact her also for special discounts for 
groups of 10 or more. 

Government Employees. Federal, State 
and Local Government employees can 
receive an additional $100 off m , tKnM 
the Three-Day Conference v 


price. Enter code GOV in the discount 
code field. CCR-registered indicates that 
we are listed in the primary supplier data¬ 
base for the Federal Government. 

Educational Institutions. Personnel 
employed by or attending educational 
institutions can get a $100 discount off the 
Three-Day Conference price by using the 
code EDU. 

User Groups. Contact Stacy Burris at 
sburris@bzmedia.com to see if your group 
is eligible for a discount. 

Non-Profit Organizations. Personnel 
employed by non-profit organizations can 
get a $100 discount off the Three-Day 
Conference price by using the code 
NONPROFIT. 

Cancellation and Refund Policy 

You can receive a full refund, less 
a $150 registration fee, for cancellations 
made by Friday, Feb. 22, 2013. Cancella¬ 
tions after this date are non-refundable. 
Send your cancellation in writing to 
registration@bzmedia.com. Registrations 
may be transferred to another person. 

Refunds will be processed through the 
same method of payment as the initial 
payment transaction. Credit-card refunds 
will be processed to the same credit card 
as the original payment. 

If for reasons beyond our control 
the conference cannot take place as 
scheduled, BZ Media reserves the right to 
reschedule the conference to a date and 
place of it's choosing. 

Questions 

Contact Stacy Burris, Event Director, at 
sburris@bzmedia.com or 
+1-631-421-4158 x!08. 













